Re: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Michael Mansour (CS)
.org" <user@spark.apache.org> Subject: [EXT] Re: [Spark Core]: Python and Scala generate different DAGs for identical code This video https://www.youtube.com/watch?v=LQHMMCf2ZWY I think. On Wed, May 10, 2017 at 8:04 PM, lucas.g...@gmail.com<mailto:lucas.g...@gmail.com> <luca

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Pavel Klemenkov
;>> generated something strange which is hard to follow: >>>>> >>>>> (2) PythonRDD[13] at RDD at PythonRDD.scala:48 [] >>>>> | MapPartitionsRDD[12] at mapPartitions at PythonRDD.scala:422 [] >>>>> | ShuffledRDD[11] at part

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread lucas.g...@gmail.com
) PythonRDD[13] at RDD at PythonRDD.scala:48 [] >>>> | MapPartitionsRDD[12] at mapPartitions at PythonRDD.scala:422 [] >>>> | ShuffledRDD[11] at partitionBy at NativeMethodAccessorImpl.java:0 >>>> [] >>>> +-(2) PairwiseRDD[10] at reduceByKey at :1 >&g

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Holden Karau
| ../log.txt MapPartitionsRDD[8] at textFile at >>> NativeMethodAccessorImpl.java:0 [] >>> | ../log.txt HadoopRDD[7] at textFile at >>> NativeMethodAccessorImpl.java:0 [] >>> >>> Why is that? Does pyspark do some optimizations under th

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Pavel Klemenkov
tFile at >> NativeMethodAccessorImpl.java:0 [] >> | ../log.txt HadoopRDD[7] at textFile at >> NativeMethodAccessorImpl.java:0 [] >> >> Why is that? Does pyspark do some optimizations under the hood? This debug

Re: [Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Holden Karau
doopRDD[7] at textFile at > NativeMethodAccessorImpl.java:0 [] > > Why is that? Does pyspark do some optimizations under the hood? This debug > string is really useless for debugging. > > > > -- > View this message in context: > http://apache-spark

[Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread pklemenkov
rk-user-list.1001560.n3.nabble.com/Spark-Core-Python-and-Scala-generate-different-DAGs-for-identical-code-tp28674.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[Spark Core]: Python and Scala generate different DAGs for identical code

2017-05-10 Thread Pavel Klemenkov
This Scala code: scala> val logs = sc.textFile("big_data_specialization/log.txt"). | filter(x => !x.contains("INFO")). | map(x => (x.split("\t")(1), 1)). | reduceByKey((x, y) => x + y) generated obvious lineage: (2) ShuffledRDD[4] at reduceByKey at :27 [] +-(2)