Re: Build changes after SPARK-13579

2016-04-04 Thread Reynold Xin
pyspark and R On Mon, Apr 4, 2016 at 9:59 PM, Marcelo Vanzin wrote: > No, tests (except pyspark) should work without having to package anything > first. > > On Mon, Apr 4, 2016 at 9:58 PM, Koert Kuipers wrote: > > do i need to run sbt package before

Re: Build changes after SPARK-13579

2016-04-04 Thread Marcelo Vanzin
No, tests (except pyspark) should work without having to package anything first. On Mon, Apr 4, 2016 at 9:58 PM, Koert Kuipers wrote: > do i need to run sbt package before doing tests? > > On Mon, Apr 4, 2016 at 11:00 PM, Marcelo Vanzin wrote: >> >> Hey

Re: Build changes after SPARK-13579

2016-04-04 Thread Koert Kuipers
do i need to run sbt package before doing tests? On Mon, Apr 4, 2016 at 11:00 PM, Marcelo Vanzin wrote: > Hey all, > > We merged SPARK-13579 today, and if you're like me and have your > hands automatically type "sbt assembly" anytime you're building Spark, > that won't

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-04-04 Thread Nezih Yigitbasi
Nope, I didn't have a chance to track the root cause, and IIRC we didn't observe it when dyn. alloc. is off. On Mon, Apr 4, 2016 at 6:16 PM Reynold Xin wrote: > BTW do you still see this when dynamic allocation is off? > > On Mon, Apr 4, 2016 at 6:16 PM, Reynold Xin

Build changes after SPARK-13579

2016-04-04 Thread Marcelo Vanzin
Hey all, We merged SPARK-13579 today, and if you're like me and have your hands automatically type "sbt assembly" anytime you're building Spark, that won't work anymore. You should now use "sbt package"; you'll still need "sbt assembly" if you require one of the remaining assemblies (streaming

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Koert Kuipers
can you try: spark.shuffle.reduceLocality.enabled=false On Mon, Apr 4, 2016 at 8:17 PM, Mike Hynes <91m...@gmail.com> wrote: > Dear all, > > Thank you for your responses. > > Michael Slavitch: > > Just to be sure: Has spark-env.sh and spark-defaults.conf been > correctly propagated to all

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-04-04 Thread Reynold Xin
Nezih, Have you had a chance to figure out why this is happening? On Tue, Mar 22, 2016 at 1:32 AM, james wrote: > I guess different workload cause diff result ? > > > > -- > View this message in context: >

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-04-04 Thread Reynold Xin
BTW do you still see this when dynamic allocation is off? On Mon, Apr 4, 2016 at 6:16 PM, Reynold Xin wrote: > Nezih, > > Have you had a chance to figure out why this is happening? > > > On Tue, Mar 22, 2016 at 1:32 AM, james wrote: > >> I guess

Re: error: reference to sql is ambiguous after import org.apache.spark._ in shell?

2016-04-04 Thread Ted Yu
Looks like the import comes from repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala : processLine("import sqlContext.sql") On Mon, Apr 4, 2016 at 5:16 PM, Jacek Laskowski wrote: > Hi Spark devs, > > I'm unsure if what I'm seeing is correct. I'd

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Mike Hynes
Dear all, Thank you for your responses. Michael Slavitch: > Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly > propagated to all nodes? Are they identical? Yes; these files are stored on a shared memory directory accessible to all nodes. Koert Kuipers: > we ran into

error: reference to sql is ambiguous after import org.apache.spark._ in shell?

2016-04-04 Thread Jacek Laskowski
Hi Spark devs, I'm unsure if what I'm seeing is correct. I'd appreciate any input to...rest my nerves :-) I did `import org.apache.spark._` by mistake, but since it's valid, I'm wondering why does Spark shell imports sql at all since it's available after the import?! (it's today's build) scala>

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Nicholas Chammas
Thanks, that was the command. :thumbsup: On Mon, Apr 4, 2016 at 6:28 PM Jakob Odersky wrote: > I just found out how the hash is calculated: > > gpg --print-md sha512 .tgz > > you can use that to check if the resulting output matches the contents > of .tgz.sha > > On Mon, Apr

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Jakob Odersky
I just found out how the hash is calculated: gpg --print-md sha512 .tgz you can use that to check if the resulting output matches the contents of .tgz.sha On Mon, Apr 4, 2016 at 3:19 PM, Jakob Odersky wrote: > The published hash is a SHA512. > > You can verify the integrity

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Jakob Odersky
The published hash is a SHA512. You can verify the integrity of the packages by running `sha512sum` on the archive and comparing the computed hash with the published one. Unfortunately however, I don't know what tool is used to generate the hash and I can't reproduce the format, so I ended up

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-04-04 Thread Karlis Zigurs
Curveball: Is there a need to use lambdas quite yet? On Mon, Apr 4, 2016 at 10:58 PM, Ofir Manor wrote: > I think that a backup plan could be to announce that JDK7 is deprecated in > Spark 2.0 and support for it will be fully removed in Spark 2.1. This gives > admins

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Nicholas Chammas
An additional note: The Spark packages being served off of CloudFront (i.e. the “direct download” option on spark.apache.org) are also corrupt. Btw what’s the correct way to verify the SHA of a Spark package? I’ve tried a few commands on working packages downloaded from Apache mirrors, but I

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-04-04 Thread Ofir Manor
I think that a backup plan could be to announce that JDK7 is deprecated in Spark 2.0 and support for it will be fully removed in Spark 2.1. This gives admins enough warning to install JDK8 along side their "main" JDK (or fully migrate to it), while allowing the project to merge JDK8-specific

Re: running lda in spark throws exception

2016-04-04 Thread Joseph Bradley
It's possible this was caused by incorrect Graph creation, fixed in [SPARK-13355]. Could you retry your dataset using the current master to see if the problem is fixed? Thanks! On Tue, Jan 19, 2016 at 5:31 AM, Li Li wrote: > I have modified my codes. I can get the total

Re: [SQL] Dataset.map gives error: missing parameter type for expanded function?

2016-04-04 Thread Michael Armbrust
It is called groupByKey now. Similar to joinWith, the schema produced by relational joins and aggregations is different than what you would expect when working with objects. So, when combining DataFrame+Dataset we renamed these functions to make this distinction clearer. On Sun, Apr 3, 2016 at

Re: explain codegen

2016-04-04 Thread Ted Yu
Thanks to all who have responded. It turned out that the following command line for maven caused the error (I forgot to include this in first email): eclipse:eclipse Once I omitted the above, 'explain codegen' works. On Mon, Apr 4, 2016 at 9:37 AM, Reynold Xin wrote: >

Re: explain codegen

2016-04-04 Thread Reynold Xin
Why don't you wipe everything out and try again? On Monday, April 4, 2016, Ted Yu wrote: > The commit you mentioned was made Friday. > I refreshed workspace Sunday - so it was included. > > Maybe this was related: > > $ bin/spark-shell > Failed to find Spark jars directory

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Koert Kuipers
we ran into similar issues and it seems related to the new memory management. can you try: spark.memory.useLegacyMode = true On Mon, Apr 4, 2016 at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote: > [ CC'ing dev list since nearly identical questions have occurred in > user list recently w/o

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-04-04 Thread Luciano Resende
Reynold, Considering the performance improvements you mentioned in your original e-mail and also considering that few other big data projects have already or are in progress of abandoning JDK 7, I think it would benefit Spark if we go with JDK 8.0 only. Are there users that will be less

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Ted Yu
Maybe temporarily take out the artifacts on S3 before the root cause is found. On Thu, Mar 24, 2016 at 7:25 AM, Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Just checking in on this again as the builds on S3 are still broken. :/ > > Could it have something to do with us moving

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Kousuke Saruta
Thanks. Of course, I verified checksum and it didn't matched. Kousuke On 2016/04/05 0:39, Jitendra Shelar wrote: We can think of using checksum for this kind of issues. On Mon, Apr 4, 2016 at 8:32 PM, Kousuke Saruta > wrote:

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-04-04 Thread Xuefeng Wu
Many open source projects are aggressive, such as Oracle JDK and Ubuntu, But they provide stable commercial supporting. In other words, the enterprises doesn't drop JDK7, might aslo do not drop Spark 1.x to adopt Spark 2.x early version. On Sun, Apr 3, 2016 at 10:29 PM -0700, "Reynold

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Jitendra Shelar
We can think of using checksum for this kind of issues. On Mon, Apr 4, 2016 at 8:32 PM, Kousuke Saruta wrote: > Oh, I overlooked that. Thanks. > > Kousuke > > > On 2016/04/04 22:58, Nicholas Chammas wrote: > > This is still an issue. The Spark 1.6.1 packages on S3

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Kousuke Saruta
Oh, I overlooked that. Thanks. Kousuke On 2016/04/04 22:58, Nicholas Chammas wrote: This is still an issue. The Spark 1.6.1 packages on S3 are corrupt. Is anyone looking into this issue? Is there anything contributors can do to help solve this problem? Nick On Sun, Mar 27, 2016 at 8:49

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Ted Yu
bq. the modifications do not touch the scheduler If the changes can be ported over to 1.6.1, do you mind reproducing the issue there ? I ask because master branch changes very fast. It would be good to narrow the scope where the behavior you observed started showing. On Mon, Apr 4, 2016 at 6:12

Re: RDD Partitions not distributed evenly to executors

2016-04-04 Thread Michael Slavitch
Just to be sure: Has spark-env.sh and spark-defaults.conf been correctly propagated to all nodes? Are they identical? > On Apr 4, 2016, at 9:12 AM, Mike Hynes <91m...@gmail.com> wrote: > > [ CC'ing dev list since nearly identical questions have occurred in > user list recently w/o

Re: explain codegen

2016-04-04 Thread Ted Yu
The commit you mentioned was made Friday. I refreshed workspace Sunday - so it was included. Maybe this was related: $ bin/spark-shell Failed to find Spark jars directory (/home/hbase/spark/assembly/target/scala-2.10). You need to build Spark before running this program. Then I did: $ ln -s

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Nicholas Chammas
This is still an issue. The Spark 1.6.1 packages on S3 are corrupt. Is anyone looking into this issue? Is there anything contributors can do to help solve this problem? Nick On Sun, Mar 27, 2016 at 8:49 PM Nicholas Chammas wrote: > Pingity-ping-pong since this is

RDD Partitions not distributed evenly to executors

2016-04-04 Thread Mike Hynes
[ CC'ing dev list since nearly identical questions have occurred in user list recently w/o resolution; c.f.: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html

Re: explain codegen

2016-04-04 Thread Herman van Hövell tot Westerflier
No, it can''t. You only need implicits when you are using the catalyst DSL. The error you get is due to the fact that the parser does not recognize the CODEGEN keyword (which was the case before we introduced this in

Re: explain codegen

2016-04-04 Thread Ted Yu
Could the error I encountered be due to missing import(s) of implicit ? Thanks On Sun, Apr 3, 2016 at 9:42 PM, Reynold Xin wrote: > Works for me on latest master. > > > > scala> sql("explain codegen select 'a' as a group by 1").head > res3: org.apache.spark.sql.Row = >