Re: Map tuple to case class in Dataset
That error looks like its caused by the class being defined in the repl itself. $line29.$read$ is the name of out outer object that is being used to compile the line containing case class Test(a: Int). Is this EMR or the Apache 1.6.1 release? On Wed, Jun 1, 2016 at 8:05 AM, Tim Gautier wrote: > I spun up another EC2 cluster today with Spark 1.6.1 and I still get the > error. > > scala> case class Test(a: Int) > defined class Test > > scala> Seq(1,2).toDS.map(t => Test(t)).show > 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 39.0 in stage > 0.0 (TID 39, ip-10-2-2-203.us-west-2.compute.internal): > java.lang.NoClassDefFoundError: Could not initialize class $line29.$read$ > at > $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) > at > $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > 16/06/01 15:04:21 INFO scheduler.TaskSetManager: Starting task 39.1 in > stage 0.0 (TID 40, ip-10-2-2-111.us-west-2.compute.internal, partition > 39,PROCESS_LOCAL, 2386 bytes) > 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 19.0 in stage > 0.0 (TID 19, ip-10-2-2-203.us-west-2.compute.internal): > java.lang.ExceptionInInitializerError > at $line29.$read$$iwC.(:7) > at $line29.$read.(:24) > at $line29.$read$.(:28) > at $line29.$read$.() > at > $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) > at > $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at $line3.$read$$iwC$$iwC.(:15) > at $line3.$read$$iwC.(:24) > at $line3.$read.(:26) > at $line3.$read$.(:30) > at $line3.$read$.() > ... 18 more > > > On Tue, May 31, 2016 at 8:48 PM Tim Gautier wrote: > >> That's really odd. I copied that code directly out of the shell and it >> errored out on me, several times. I wonder if something I did previously >> caused some instability. I'll see if it happens again tomorrow. >> >> On Tue, May 31, 2016, 8:37 PM Ted Yu wrote: >> >>> Using spark-shell of 1.6.1 : >>> >>> scala> case class Test(a: Int) >>> defined class Test >>> >>> scala> Seq(1,2).toDS.map(t => Test(t)).show >>> +---+ >>> | a| >>> +---+ >>> | 1| >>> | 2| >>> +---+ >>> >>> FYI >>> >>> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier >>> wrote: >>> 1.6.1 The exception is a null pointer exception. I'll paste the whole thing after I fire my cluster up again tomorrow. I take it by the responses that this is supposed to work? Anyone know when the next version is coming out? I keep running into bugs with 1.6.1 that are hindering my progress. On Tue, May 31, 2016, 8:21 PM Saisai Shao wrote: > It works fine in my local test, I'm using latest master, maybe this > bug is already fixed. > > On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust < > mich...@databricks.com> wrote: > >> Version of Spark? What is the exception? >> >> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier >> wrote: >> >>> How should I go about mapping from say a Dataset[(Int,Int)] to a >>> Dataset[]? >>> >>> I tried to use a map, but it throws except
Re: Map tuple to case class in Dataset
I was getting a warning about /tmp/hive not being writable whenever I started spark-shell, but I was ignoring it. I decided to set the permissions to 777 and restart the shell. After doing that, I now get the same result as Ted Yu when running Seq(1,2).toDS.map(t => Test(t)).show. On Wed, Jun 1, 2016 at 9:05 AM Tim Gautier wrote: > I spun up another EC2 cluster today with Spark 1.6.1 and I still get the > error. > > scala> case class Test(a: Int) > defined class Test > > scala> Seq(1,2).toDS.map(t => Test(t)).show > 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 39.0 in stage > 0.0 (TID 39, ip-10-2-2-203.us-west-2.compute.internal): > java.lang.NoClassDefFoundError: Could not initialize class $line29.$read$ > at > $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) > at > $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > 16/06/01 15:04:21 INFO scheduler.TaskSetManager: Starting task 39.1 in > stage 0.0 (TID 40, ip-10-2-2-111.us-west-2.compute.internal, partition > 39,PROCESS_LOCAL, 2386 bytes) > 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 19.0 in stage > 0.0 (TID 19, ip-10-2-2-203.us-west-2.compute.internal): > java.lang.ExceptionInInitializerError > at $line29.$read$$iwC.(:7) > at $line29.$read.(:24) > at $line29.$read$.(:28) > at $line29.$read$.() > at > $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) > at > $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at $line3.$read$$iwC$$iwC.(:15) > at $line3.$read$$iwC.(:24) > at $line3.$read.(:26) > at $line3.$read$.(:30) > at $line3.$read$.() > ... 18 more > > > On Tue, May 31, 2016 at 8:48 PM Tim Gautier wrote: > >> That's really odd. I copied that code directly out of the shell and it >> errored out on me, several times. I wonder if something I did previously >> caused some instability. I'll see if it happens again tomorrow. >> >> On Tue, May 31, 2016, 8:37 PM Ted Yu wrote: >> >>> Using spark-shell of 1.6.1 : >>> >>> scala> case class Test(a: Int) >>> defined class Test >>> >>> scala> Seq(1,2).toDS.map(t => Test(t)).show >>> +---+ >>> | a| >>> +---+ >>> | 1| >>> | 2| >>> +---+ >>> >>> FYI >>> >>> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier >>> wrote: >>> 1.6.1 The exception is a null pointer exception. I'll paste the whole thing after I fire my cluster up again tomorrow. I take it by the responses that this is supposed to work? Anyone know when the next version is coming out? I keep running into bugs with 1.6.1 that are hindering my progress. On Tue, May 31, 2016, 8:21 PM Saisai Shao wrote: > It works fine in my local test, I'm using latest master, maybe this > bug is already fixed. > > On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust < > mich...@databricks.com> wrote: > >> Version of Spark? What is the exception? >> >> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier >> wrote: >> >>> How should I go about mapping from say a Dataset[(Int,Int)] to a >>> Dataset[]? >>> >>> I trie
Re: Map tuple to case class in Dataset
I spun up another EC2 cluster today with Spark 1.6.1 and I still get the error. scala> case class Test(a: Int) defined class Test scala> Seq(1,2).toDS.map(t => Test(t)).show 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 39.0 in stage 0.0 (TID 39, ip-10-2-2-203.us-west-2.compute.internal): java.lang.NoClassDefFoundError: Could not initialize class $line29.$read$ at $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) at $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/06/01 15:04:21 INFO scheduler.TaskSetManager: Starting task 39.1 in stage 0.0 (TID 40, ip-10-2-2-111.us-west-2.compute.internal, partition 39,PROCESS_LOCAL, 2386 bytes) 16/06/01 15:04:21 WARN scheduler.TaskSetManager: Lost task 19.0 in stage 0.0 (TID 19, ip-10-2-2-203.us-west-2.compute.internal): java.lang.ExceptionInInitializerError at $line29.$read$$iwC.(:7) at $line29.$read.(:24) at $line29.$read$.(:28) at $line29.$read$.() at $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) at $line33.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:35) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$10.next(Iterator.scala:312) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at $line3.$read$$iwC$$iwC.(:15) at $line3.$read$$iwC.(:24) at $line3.$read.(:26) at $line3.$read$.(:30) at $line3.$read$.() ... 18 more On Tue, May 31, 2016 at 8:48 PM Tim Gautier wrote: > That's really odd. I copied that code directly out of the shell and it > errored out on me, several times. I wonder if something I did previously > caused some instability. I'll see if it happens again tomorrow. > > On Tue, May 31, 2016, 8:37 PM Ted Yu wrote: > >> Using spark-shell of 1.6.1 : >> >> scala> case class Test(a: Int) >> defined class Test >> >> scala> Seq(1,2).toDS.map(t => Test(t)).show >> +---+ >> | a| >> +---+ >> | 1| >> | 2| >> +---+ >> >> FYI >> >> On Tue, May 31, 2016 at 7:35 PM, Tim Gautier >> wrote: >> >>> 1.6.1 The exception is a null pointer exception. I'll paste the whole >>> thing after I fire my cluster up again tomorrow. >>> >>> I take it by the responses that this is supposed to work? >>> >>> Anyone know when the next version is coming out? I keep running into >>> bugs with 1.6.1 that are hindering my progress. >>> >>> On Tue, May 31, 2016, 8:21 PM Saisai Shao >>> wrote: >>> It works fine in my local test, I'm using latest master, maybe this bug is already fixed. On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust < mich...@databricks.com> wrote: > Version of Spark? What is the exception? > > On Tue, May 31, 2016 at 4:17 PM, Tim Gautier > wrote: > >> How should I go about mapping from say a Dataset[(Int,Int)] to a >> Dataset[]? >> >> I tried to use a map, but it throws exceptions: >> >> case class Test(a: Int) >> Seq(1,2).toDS.map(t => Test(t)).show >> >> Thanks, >> Tim >> > > >>
Re: Map tuple to case class in Dataset
That's really odd. I copied that code directly out of the shell and it errored out on me, several times. I wonder if something I did previously caused some instability. I'll see if it happens again tomorrow. On Tue, May 31, 2016, 8:37 PM Ted Yu wrote: > Using spark-shell of 1.6.1 : > > scala> case class Test(a: Int) > defined class Test > > scala> Seq(1,2).toDS.map(t => Test(t)).show > +---+ > | a| > +---+ > | 1| > | 2| > +---+ > > FYI > > On Tue, May 31, 2016 at 7:35 PM, Tim Gautier > wrote: > >> 1.6.1 The exception is a null pointer exception. I'll paste the whole >> thing after I fire my cluster up again tomorrow. >> >> I take it by the responses that this is supposed to work? >> >> Anyone know when the next version is coming out? I keep running into bugs >> with 1.6.1 that are hindering my progress. >> >> On Tue, May 31, 2016, 8:21 PM Saisai Shao wrote: >> >>> It works fine in my local test, I'm using latest master, maybe this bug >>> is already fixed. >>> >>> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust >> > wrote: >>> Version of Spark? What is the exception? On Tue, May 31, 2016 at 4:17 PM, Tim Gautier wrote: > How should I go about mapping from say a Dataset[(Int,Int)] to a > Dataset[]? > > I tried to use a map, but it throws exceptions: > > case class Test(a: Int) > Seq(1,2).toDS.map(t => Test(t)).show > > Thanks, > Tim > >>> >
Re: Map tuple to case class in Dataset
Using spark-shell of 1.6.1 : scala> case class Test(a: Int) defined class Test scala> Seq(1,2).toDS.map(t => Test(t)).show +---+ | a| +---+ | 1| | 2| +---+ FYI On Tue, May 31, 2016 at 7:35 PM, Tim Gautier wrote: > 1.6.1 The exception is a null pointer exception. I'll paste the whole > thing after I fire my cluster up again tomorrow. > > I take it by the responses that this is supposed to work? > > Anyone know when the next version is coming out? I keep running into bugs > with 1.6.1 that are hindering my progress. > > On Tue, May 31, 2016, 8:21 PM Saisai Shao wrote: > >> It works fine in my local test, I'm using latest master, maybe this bug >> is already fixed. >> >> On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust >> wrote: >> >>> Version of Spark? What is the exception? >>> >>> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier >>> wrote: >>> How should I go about mapping from say a Dataset[(Int,Int)] to a Dataset[]? I tried to use a map, but it throws exceptions: case class Test(a: Int) Seq(1,2).toDS.map(t => Test(t)).show Thanks, Tim >>> >>> >>
Re: Map tuple to case class in Dataset
1.6.1 The exception is a null pointer exception. I'll paste the whole thing after I fire my cluster up again tomorrow. I take it by the responses that this is supposed to work? Anyone know when the next version is coming out? I keep running into bugs with 1.6.1 that are hindering my progress. On Tue, May 31, 2016, 8:21 PM Saisai Shao wrote: > It works fine in my local test, I'm using latest master, maybe this bug is > already fixed. > > On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust > wrote: > >> Version of Spark? What is the exception? >> >> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier >> wrote: >> >>> How should I go about mapping from say a Dataset[(Int,Int)] to a >>> Dataset[]? >>> >>> I tried to use a map, but it throws exceptions: >>> >>> case class Test(a: Int) >>> Seq(1,2).toDS.map(t => Test(t)).show >>> >>> Thanks, >>> Tim >>> >> >> >
Re: Map tuple to case class in Dataset
It works fine in my local test, I'm using latest master, maybe this bug is already fixed. On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust wrote: > Version of Spark? What is the exception? > > On Tue, May 31, 2016 at 4:17 PM, Tim Gautier > wrote: > >> How should I go about mapping from say a Dataset[(Int,Int)] to a >> Dataset[]? >> >> I tried to use a map, but it throws exceptions: >> >> case class Test(a: Int) >> Seq(1,2).toDS.map(t => Test(t)).show >> >> Thanks, >> Tim >> > >
Re: Map tuple to case class in Dataset
Version of Spark? What is the exception? On Tue, May 31, 2016 at 4:17 PM, Tim Gautier wrote: > How should I go about mapping from say a Dataset[(Int,Int)] to a > Dataset[]? > > I tried to use a map, but it throws exceptions: > > case class Test(a: Int) > Seq(1,2).toDS.map(t => Test(t)).show > > Thanks, > Tim >