Re: SPARK-13843 Next steps

2016-03-22 Thread Cody Koeninger
I'm in favor of everything in /extras and /external being removed, but I'm more in favor of making a decision and moving on. On Tue, Mar 22, 2016 at 12:20 PM, Marcelo Vanzin wrote: > +1 for getting flume back. > > On Tue, Mar 22, 2016 at 12:27 AM, Kostas Sakellis

Re: SPARK-13843 Next steps

2016-03-22 Thread Marcelo Vanzin
+1 for getting flume back. On Tue, Mar 22, 2016 at 12:27 AM, Kostas Sakellis wrote: > Hello all, > > I'd like to close out the discussion on SPARK-13843 by getting a poll from > the community on which components we should seriously reconsider re-adding > back to Apache

Re: toPandas very slow

2016-03-22 Thread Josh Levy-Kramer
Hi all, Wez, I read your thread earlier today after I sent this message and its exciting someone of your caliber working on the issue :) For a short term solution i've created a Gist which performs the toPandas operation using the mapPartitions method suggested by Mark:

Re: toPandas very slow

2016-03-22 Thread Wes McKinney
hi all, I recently did an analysis of the performance of toPandas summary: http://wesmckinney.com/blog/pandas-and-apache-arrow/ ipython notebook: https://gist.github.com/wesm/0cb5531b1c2e346a0007 One solution I'm planning for this is an alternate serializer for Spark DataFrames, with an

Re: toPandas very slow

2016-03-22 Thread Mark Vervuurt
Hi Josh, The work around we figured out to solve network latency and out of memory problems with the toPandas method was to create Pandas DataFrames or Numpy Arrays using MapPartitions for each partition. Maybe a standard solution around this line of thought could be built. The integration is

new object store driver for Spark

2016-03-22 Thread Gil Vernik
We recently released an object store connector for Spark. https://github.com/SparkTC/stocator Currently this connector contains driver for the Swift based object store ( like SoftLayer or any other Swift cluster ), but it can easily support additional object stores. There is a pending patch to

Job description only visible after job finish

2016-03-22 Thread hansbogert
Hi, I’m trying to do some dynamic scheduling by an external application by looking at the jobs in a Spark framework. I need the job description to know which kind of query I’m dealing with. The problem is that the job description (set with: sparkCtx.setJobDescription) but in case of a job with

toPandas very slow

2016-03-22 Thread Josh Levy-Kramer
Hi, A common pattern in my work is querying large tables in Spark DataFrames and then needing to do more detailed analysis locally when the data can fit into memory. However, i've hit a few blockers. In Scala no well developed DataFrame library exists and in Python the `toPandas` function is very

Re: error occurs to compile spark 1.6.1 using scala 2.11.8

2016-03-22 Thread Ted Yu
>From the error message, it seems some artifacts from Scala 2.10.4 were left around. FYI maven 3.3.9 is required for master branch. On Tue, Mar 22, 2016 at 3:07 AM, Allen wrote: > Hi, > > I am facing an error when doing compilation from IDEA, please see the > attached. I

StatefulNetworkWordCount behaviour

2016-03-22 Thread Rishi Mishra
I am trying out StatefulNetworkWordCount from latest Spark master branch. When I run this example I see a odd behaviour. If in a batch a key is repeated the output stream prints for each repetition e.g. If I key in "ab" five times for input it will show like (ab,1) (ab,2) (ab,3) (ab,4) (ab,5)

Re: Can we remove private[spark] from Metrics Source and SInk traits?

2016-03-22 Thread Steve Loughran
On 19 Mar 2016, at 16:16, Pete Robbins > wrote: There are several open Jiras to add new Sinks OpenTSDB https://issues.apache.org/jira/browse/SPARK-12194 StatsD https://issues.apache.org/jira/browse/SPARK-11574 statsd is nicely easy to test:

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-22 Thread james
I guess different workload cause diff result ? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-Unable-to-acquire-bytes-of-memory-tp16773p16789.html Sent from the Apache Spark Developers List mailing list archive at

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-22 Thread Nezih Yigitbasi
Interesting. After experimenting with various parameters increasing spark.sql.shuffle.partitions and decreasing spark.buffer.pageSize helped my job go through. BTW I will be happy to help getting this issue fixed. Nezih On Tue, Mar 22, 2016 at 1:07 AM james wrote: Hi, > I

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-22 Thread james
Hi, I also found 'Unable to acquire memory' issue using Spark 1.6.1 with Dynamic allocation on YARN. My case happened with setting spark.sql.shuffle.partitions larger than 200. From error stack, it has a diff with issue reported by Nezih and not sure if these has same root cause. Thanks James

Re: SPARK-13843 Next steps

2016-03-22 Thread Jean-Baptiste Onofré
OK, so kafka, kinesis and flume will stay in Spark. Thanks, Regards JB On 03/22/2016 08:30 AM, Reynold Xin wrote: Kinesis is still in it. I think it's OK to add Flume back. On Tue, Mar 22, 2016 at 12:29 AM, Jean-Baptiste Onofré > wrote: Thanks

Re: SPARK-13843 Next steps

2016-03-22 Thread Reynold Xin
Kinesis is still in it. I think it's OK to add Flume back. On Tue, Mar 22, 2016 at 12:29 AM, Jean-Baptiste Onofré wrote: > Thanks for the update Kostas, > > for now, kafka stays in Spark and Kinesis will be removed, right ? > > Regards > JB > > On 03/22/2016 08:27 AM, Kostas

Re: SPARK-13843 Next steps

2016-03-22 Thread Jean-Baptiste Onofré
Thanks for the update Kostas, for now, kafka stays in Spark and Kinesis will be removed, right ? Regards JB On 03/22/2016 08:27 AM, Kostas Sakellis wrote: Hello all, I'd like to close out the discussion on SPARK-13843 by getting a poll from the community on which components we should

SPARK-13843 Next steps

2016-03-22 Thread Kostas Sakellis
Hello all, I'd like to close out the discussion on SPARK-13843 by getting a poll from the community on which components we should seriously reconsider re-adding back to Apache Spark. For reference, here are the modules that were removed as part of SPARK-13843 and pushed to: