Re: Utilizing YARN AM RPC port field

2016-06-15 Thread Mingyu Kim
FYI, I just filed https://issues.apache.org/jira/browse/SPARK-15974. Mingyu From: Mingyu Kim <m...@palantir.com> Date: Tuesday, June 14, 2016 at 2:13 PM To: Steve Loughran <ste...@hortonworks.com> Cc: "dev@spark.apache.org" <dev@spark.apache.org>, Matt Cheah &

Re: Utilizing YARN AM RPC port field

2016-06-14 Thread Mingyu Kim
If there are no objections, I can file a bug and find time to tackle it myself. Mingyu From: Steve Loughran <ste...@hortonworks.com> Date: Tuesday, June 14, 2016 at 4:55 AM To: Mingyu Kim <m...@palantir.com> Cc: "dev@spark.apache.org" <dev@spark.apache.org>, Matt

Utilizing YARN AM RPC port field

2016-06-13 Thread Mingyu Kim
Hi all, YARN provides a way for AppilcationMaster to register a RPC port so that a client outside the YARN cluster can reach the application for any RPCs, but Spark’s YARN AMs simply register a dummy port number of 0. (See

Re: Spark 1.6.1

2016-02-02 Thread Mingyu Kim
Cool, thanks! Mingyu From: Michael Armbrust <mich...@databricks.com> Date: Tuesday, February 2, 2016 at 10:48 AM To: Mingyu Kim <m...@palantir.com> Cc: Romi Kuntsman <r...@totango.com>, Hamel Kothari <hamelkoth...@gmail.com>, Ted Yu <yuzhih...@gmail.com>

Re: Spark 1.6.1

2016-02-02 Thread Mingyu Kim
Hi all, Is there an estimated timeline for 1.6.1 release? Just wanted to check how the release is coming along. Thanks! Mingyu From: Romi Kuntsman Date: Tuesday, February 2, 2016 at 3:16 AM To: Michael Armbrust Cc: Hamel Kothari

Re: And.eval short circuiting

2015-09-18 Thread Mingyu Kim
the guarantees of the optimizer. Is there a bug filed that tracks the change you suggested below, btw? I’d like to follow the issue, if there’s one. Thanks, Mingyu From: Reynold Xin Date: Wednesday, September 16, 2015 at 1:17 PM To: Zack Sampson Cc: "dev@spark.apache.org", Mingyu

Re: And.eval short circuiting

2015-09-18 Thread Mingyu Kim
I filed SPARK-10703. Thanks! Mingyu From: Reynold Xin Date: Thursday, September 17, 2015 at 11:22 PM To: Mingyu Kim Cc: Zack Sampson, "dev@spark.apache.org", Peter Faiman, Matt Cheah, Michael Armbrust Subject: Re: And.eval short circuiting Please file a ticket and cc me. Thanks.

Re: [discuss] new Java friendly InputSource API

2015-04-24 Thread Mingyu Kim
...@databricks.commailto:r...@databricks.com Date: Thursday, April 23, 2015 at 11:09 AM To: Mingyu Kim m...@palantir.commailto:m...@palantir.com Cc: Soren Macbeth so...@yieldbot.commailto:so...@yieldbot.com, Punyashloka Biswal punya.bis...@gmail.commailto:punya.bis...@gmail.com, dev@spark.apache.orgmailto:dev

Re: [discuss] new Java friendly InputSource API

2015-04-23 Thread Mingyu Kim
Hi Reynold, You mentioned that the new API allows arbitrary code to be run on the driver side, but it¹s not very clear to me how this is different from what Hadoop API provides. In your example of using broadcast, did you mean broadcasting something in InputSource.getPartitions() and having

Re: Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Mingyu Kim
to be sent back to the driver when the task completes. - Patrick On Wed, Mar 4, 2015 at 4:01 PM, Mingyu Kim m...@palantir.com wrote: Hi all, It looks like the result of task is serialized twice, once by serializer (I.e. Java/Kryo depending on configuration) and once again by closure serializer

Task result is serialized twice by serializer and closure serializer

2015-03-04 Thread Mingyu Kim
Hi all, It looks like the result of task is serialized twice, once by serializer (I.e. Java/Kryo depending on configuration) and once again by closure serializer (I.e. Java). To link the actual code, The first one:

The default CDH4 build uses avro-mapred hadoop1

2015-02-20 Thread Mingyu Kim
Hi all, Related to https://issues.apache.org/jira/browse/SPARK-3039, the default CDH4 build, which is built with mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package”, pulls in avro-mapred hadoop1, as opposed to avro-mapred hadoop2. This ends up in the same error as mentioned in

Re: The default CDH4 build uses avro-mapred hadoop1

2015-02-20 Thread Mingyu Kim
Hadoop versions - I don't think it's quite right to have vendor-specific builds in Spark to begin with - We should be moving to only support Hadoop 2 soon IMHO anyway - CDH4 is EOL in a few months I think On Fri, Feb 20, 2015 at 8:30 AM, Mingyu Kim m...@palantir.com wrote: Hi all, Related to https

Re: Streaming partitions to driver for use in .toLocalIterator

2015-02-18 Thread Mingyu Kim
Another alternative would be to compress the partition in memory in a streaming fashion instead of calling .toArray on the iterator. Would it be an easier mitigation to the problem? Or, is it hard to compress the rows one by one without materializing the full partition in memory using the

Spark master OOMs with exception stack trace stored in JobProgressListener (SPARK-4906)

2014-12-19 Thread Mingyu Kim
Hi, I just filed a bug SPARK-4906https://issues.apache.org/jira/browse/SPARK-4906, regarding Spark master OOMs. If I understand correctly, the UI states for all running applications are kept in memory retained by JobProgressListener, and when there are a lot of exception stack traces, this UI

Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster

2014-08-15 Thread Mingyu Kim
Wendell pwend...@gmail.com Date: Thursday, August 14, 2014 at 6:32 PM To: Gary Malouf malouf.g...@gmail.com Cc: Mingyu Kim m...@palantir.com, dev@spark.apache.org dev@spark.apache.org Subject: Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster I commented

[SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster

2014-08-14 Thread Mingyu Kim
I ran a really simple code that runs with Spark 1.0.2 jar and connects to a Spark 1.0.1 cluster, but it fails with java.io.InvalidClassException. I filed the bug at https://issues.apache.org/jira/browse/SPARK-3050. I assumed the minor and patch releases shouldn¹t break compatibility. Is that