Re: Dataset API Question

2017-10-25 Thread Bernard Jesop
Actually, I realized keeping the info would not be enough as I need to find back the checkpoint files to delete them :/ 2017-10-25 19:07 GMT+02:00 Bernard Jesop <bernard.je...@gmail.com>: > As far as I understand, Dataset.rdd is not the same as InternalRDD. > It is just

Re: Dataset API Question

2017-10-25 Thread Bernard Jesop
been checkpointed? Should I manually keep track of that info? 2017-10-25 15:51 GMT+02:00 Bernard Jesop <bernard.je...@gmail.com>: > Hello everyone, > > I have a question about checkpointing on dataset. > > It seems in 2.1.0 that there is a Dataset.checkpoint(

Dataset API Question

2017-10-25 Thread Bernard Jesop
Hello everyone, I have a question about checkpointing on dataset. It seems in 2.1.0 that there is a Dataset.checkpoint(), however unlike RDD there is no Dataset.isCheckpointed(). I wonder if Dataset.checkpoint is a syntactic sugar for Dataset.rdd.checkpoint. When I do : Dataset.checkpoint;

Re: underlying checkpoint

2017-07-13 Thread Bernard Jesop
cala> df.rdd.isCheckpointed > res2: Boolean = false > > scala> df.show() > +--+---+ > |_1| _2| > +--+---+ > | Scala| 35| > |Python| 30| > | R| 15| > | Java| 20| > +--+---+ > > > scala> df.rdd.isCheckpointed > res4: Boolean = false > > sca

underlying checkpoint

2017-07-13 Thread Bernard Jesop
nted"*. Do you have any idea why? (knowing that the checkpoint file is created) Best regards, Bernard JESOP

Re: Analysis Exception after join

2017-07-04 Thread Bernard Jesop
It seems to be because of this issues: https://issues.apache.org/jira/browse/SPARK-10925 I added a checkpoint, as suggested, to break the lineage and it worked. Best regards, 2017-07-04 17:26 GMT+02:00 Bernard Jesop <bernard.je...@gmail.com>: > Thank Didac, > > My bad, act

Re: Analysis Exception after join

2017-07-04 Thread Bernard Jesop
rouped by S_ID. > > I guess that you are looking for something more like the following example > dfAgg = df.groupBy("S_ID”) >.agg(org.apache.spark.sql.functions.count(*“userName"*).as( > *“usersCount**”*), > .agg(org.apache.spark.sql.functions.collect_set(“city&q

Analysis Exception after join

2017-07-03 Thread Bernard Jesop
Hello, I don't understand my error message. Basically, all I am doing is : - dfAgg = df.groupBy("S_ID") - dfRes = df.join(dfAgg, Seq("S_ID"), "left_outer") However I get this AnalysisException: " Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved attribute(s) S_ID#1903L