Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Hamstra
No, that isn't necessarily enough to be considered a blocker. A blocker would be something that would have large negative effects on a significant number of people trying to run Spark. Arguably, something that prevents a minority of Spark developers from running unit tests on one OS does not

RE: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Ulanov, Alexander
Here is the fix https://github.com/apache/spark/pull/13868 From: Reynold Xin [mailto:r...@databricks.com] Sent: Wednesday, June 22, 2016 6:43 PM To: Ulanov, Alexander Cc: Mark Hamstra ; Marcelo Vanzin ; dev@spark.apache.org

why did spark2.0 Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.)

2016-06-22 Thread linxi zeng
Hi All, I have tried the spark sql of Spark branch-2.0 and countered an unexpected problem: Operation not allowed: ROW FORMAT DELIMITED is only compatible with 'textfile', not 'orc'(line 1, pos 0) the sql is like: CREATE TABLE IF NOT EXISTS test.test_orc ( ... ) PARTITIONED BY (xxx) ROW

Re: Creating a python port for a Scala Spark Projeect

2016-06-22 Thread Daniel Imberman
Thank you Holden, I look forward to watching your talk! On Wed, Jun 22, 2016 at 7:12 PM Holden Karau wrote: > PySpark RDDs are (on the Java side) are essentially RDD of pickled objects > and mostly (but not entirely) opaque to the JVM. It is possible (by using > some

Re: Creating a python port for a Scala Spark Projeect

2016-06-22 Thread Holden Karau
PySpark RDDs are (on the Java side) are essentially RDD of pickled objects and mostly (but not entirely) opaque to the JVM. It is possible (by using some internals) to pass a PySpark DataFrame to a Scala library (you may or may not find the talk I gave at Spark Summit useful

Creating a python port for a Scala Spark Projeect

2016-06-22 Thread Daniel Imberman
Hi All, I've developed a spark module in scala that I would like to add a python port for. I want to be able to allow users to create a pyspark RDD and send it to my system. I've been looking into the pyspark source code as well as py4J and was wondering if there has been anything like this

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Reynold Xin
Alex - if you have access to a windows box, can you fix the issue? I'm not sure how many Spark contributors have windows boxes. On Wed, Jun 22, 2016 at 5:56 PM, Ulanov, Alexander wrote: > Spark Unit tests fail on Windows in Spark 2.0. It can be considered as >

RE: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Ulanov, Alexander
Spark Unit tests fail on Windows in Spark 2.0. It can be considered as blocker since there are people that develop for Spark on Windows. The referenced issue is indeed Minor and has nothing to do with unit tests. From: Mark Hamstra [mailto:m...@clearstorydata.com] Sent: Wednesday, June 22, 2016

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Hamstra
It's also marked as Minor, not Blocker. On Wed, Jun 22, 2016 at 4:07 PM, Marcelo Vanzin wrote: > On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander > wrote: > > -1 > > > > Spark Unit tests fail on Windows. Still not resolved, though marked as > >

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Marcelo Vanzin
On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander wrote: > -1 > > Spark Unit tests fail on Windows. Still not resolved, though marked as > resolved. To be pedantic, it's marked as a duplicate (https://issues.apache.org/jira/browse/SPARK-15899), which doesn't mean

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Hamstra
SPARK-15893 is resolved as a duplicate of SPARK-15899. SPARK-15899 is Unresolved. On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander wrote: > -1 > > Spark Unit tests fail on Windows. Still not resolved, though marked as > resolved. > >

RE: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Ulanov, Alexander
-1 Spark Unit tests fail on Windows. Still not resolved, though marked as resolved. https://issues.apache.org/jira/browse/SPARK-15893 From: Reynold Xin [mailto:r...@databricks.com] Sent: Tuesday, June 21, 2016 6:27 PM To: dev@spark.apache.org Subject: [VOTE] Release Apache Spark 2.0.0 (RC1)

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sameer Agarwal
+1 On Wed, Jun 22, 2016 at 1:07 PM, Kousuke Saruta wrote: > +1 (non-binding) > > On 2016/06/23 4:53, Reynold Xin wrote: > > +1 myself > > > On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara < > sean.mcnam...@webtrends.com> wrote: > >> +1 >> >> On Jun 22, 2016, at 1:14

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Kousuke Saruta
+1 (non-binding) On 2016/06/23 4:53, Reynold Xin wrote: +1 myself On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara > wrote: +1 On Jun 22, 2016, at 1:14 PM, Michael Armbrust

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean McNamara
+1 On Jun 22, 2016, at 1:14 PM, Michael Armbrust > wrote: +1 On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly > wrote: +1 On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Michael Armbrust
+1 On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly wrote: > +1 > > On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter > wrote: > >> +1 This release passes all tests on the graphframes and tensorframes >> packages. >> >> On Wed, Jun 22, 2016 at 7:19

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Jonathan Kelly
+1 On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter wrote: > +1 This release passes all tests on the graphframes and tensorframes > packages. > > On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger > wrote: > >> If we're considering backporting changes for

Re: Question about Bloom Filter in Spark 2.0

2016-06-22 Thread Jörn Franke
You should see at it both levels: there is one bloom filter for Orc data and one for data in-memory. It is already a good step towards an integration of format and in-memory representation for columnar data. > On 22 Jun 2016, at 14:01, BaiRan wrote: > > After building

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Tim Hunter
+1 This release passes all tests on the graphframes and tensorframes packages. On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger wrote: > If we're considering backporting changes for the 0.8 kafka > integration, I am sure there are people who would like to get > >

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Grover
Yeah, I am +1 for including Kafka 0.10 integration as well. We had to wait for Kafka 0.10 because there were incompatibilities between the Kafka 0.9 and 0.10 API. And, yes, the code for 0.8.0 remains unchanged so there shouldn't be any regression for existing users. It's only new code for 0.10.

[build system] jenkins process wedged, need to do restart

2016-06-22 Thread shane knapp
of course, on my first day back from vacation, i notice that the jenkins process got wedged immediately upon my visiting the page. one quick jenkins/httpd restart later and we're back up and building. sorry for any inconvenience! shane

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Chris Fregly
+1 for 0.10 support. this is huge. On Wed, Jun 22, 2016 at 8:17 AM, Cody Koeninger wrote: > Luciano knows there are publicly available examples of how to use the > 0.10 connector, including TLS support, because he asked me about it > and I gave him a link > > >

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Cody Koeninger
Luciano knows there are publicly available examples of how to use the 0.10 connector, including TLS support, because he asked me about it and I gave him a link https://github.com/koeninger/kafka-exactly-once/blob/kafka-0.9/src/main/scala/example/TlsStream.scala If any committer at any time had

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Luciano Resende
On Wed, Jun 22, 2016 at 7:46 AM, Cody Koeninger wrote: > As far as I know the only thing blocking it at this point is lack of > committer review / approval. > > It's technically adding a new feature after spark code-freeze, but it > doesn't change existing code, and the kafka

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Sean Owen
Hm, I thought that was to be added for 2.0. Imran I know you may have been working alongside Mark on it; what do you think? TD / Reynold would you object to it for 2.0? On Wed, Jun 22, 2016 at 3:46 PM, Cody Koeninger wrote: > As far as I know the only thing blocking it at

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Cody Koeninger
As far as I know the only thing blocking it at this point is lack of committer review / approval. It's technically adding a new feature after spark code-freeze, but it doesn't change existing code, and the kafka project didn't release 0.10 until the end of may. On Wed, Jun 22, 2016 at 9:39 AM,

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Sean Owen
I profess ignorance again though I really should know by now, but, what's opposing that? I personally thought this was going to be in 2.0 and didn't kind of notice it wasn't ... On Wed, Jun 22, 2016 at 3:29 PM, Cody Koeninger wrote: > I don't have a vote, but I'd just like to

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Nicholas Chammas
For the clueless (like me): https://bahir.apache.org/#home Apache Bahir provides extensions to distributed analytic platforms such as Apache Spark. Initially Apache Bahir will contain streaming connectors that were a part of Apache Spark prior to version 2.0: - streaming-akka -

Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Cody Koeninger
I don't have a vote, but I'd just like to reiterate that I think kafka 0.10 support should be added to a 2.0 release candidate; if not now, then well before release. - it's a completely standalone jar, so shouldn't break anyone who's using the existing 0.8 support - it's like the 5th highest

Re: Spark internal Logging trait potential thread unsafe

2016-06-22 Thread Prajwal Tuladhar
Created a JIRA issue https://issues.apache.org/jira/browse/SPARK-16131 and PR @ https://github.com/apache/spark/pull/13842 On Fri, Jun 17, 2016 at 5:19 AM, Sean Owen wrote: > I think that's OK to change, yes. I don't see why it's necessary to > init log_ the way it is now.

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean Owen
Good call, probably worth back-porting, I'll try to do that. I don't think it blocks a release, but would be good to get into a next RC if any. On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins wrote: > This has failed on our 1.6 stream builds regularly. >

Re: Question about Bloom Filter in Spark 2.0

2016-06-22 Thread BaiRan
After building bloom filter on existing data, does spark engine utilise bloom filter during query processing? Is there any plan about predicate push down by using bloom filter in ORC / Parquet? Thanks Ran > On 22 Jun, 2016, at 10:48 am, Reynold Xin wrote: > > SPARK-12818

Spark Task failure with File segment length as negative

2016-06-22 Thread Priya Ch
Hi All, I am running Spark Application with 1.8TB of data (which is stored in Hive tables format). I am reading the data using HiveContect and processing it. The cluster has 5 nodes total, 25 cores per machine and 250Gb per node. I am launching the application with 25 executors with 5 cores each

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Pete Robbins
This has failed on our 1.6 stream builds regularly. ( https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0? On Wed, 22 Jun 2016 at 11:15 Sean Owen wrote: > Oops, one more in the "does anybody else see this" department: > > - offset recovery *** FAILED *** >

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean Owen
Oops, one more in the "does anybody else see this" department: - offset recovery *** FAILED *** recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time, Array[org.apache.spark.streaming.kafka.OffsetRange])) =>

Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean Owen
I'm fairly convinced this error and others that appear timestamp related are an environment problem. This test and method have been present for several Spark versions, without change. I reviewed the logic and it seems sound, explicitly setting the time zone correctly. I am not sure why it behaves