Thanks for update on yarn-client. I get the difference between driver and executor, but why is there a difference between output in spark local mode running in spark-shell and Zeppelin on the same machine though?
As a user, I would expect output to be the same, in case when executor runs on the same host as driver, on both systems. -- Alex On Mon, May 8, 2017, 19:10 Jeff Zhang <zjf...@gmail.com> wrote: > This is expected. And I believe you are using local mode. You should be > able to get the same output in yarn-client mode. > println function is invoked on executor side, while IMain of spark repl > only capture the output of driver. The reason you see the output of println > in spark-shell is that the executor runs in the same host of driver. So it > mixes with output of driver. > > > Alexander Bezzubov <b...@apache.org>于2017年5月8日周一 上午7:18写道: > > > Hey guys, > > > > Introducing Apache Zeppelin to a new org, I have recently noticed that on > > quite a simple but important use case the output of Zeppelin is *very > > different* from spark-shell > > > > I can print partitions of the RDD in spark-shell > > > > ``` > > scala> val data = sc.parallelize(List((1, 2), (1, 1), (2, 3), (2, 1), (1, > > 4), (3, 5)), 2) > > scala> data.mapPartitions { _.map { println(_) } } collect > > [Stage 0:> (0 + > > 0) / 2] > > (2,1) > > (1,4) > > (3,5) > > (1,2) > > (1,1) > > (2,3) > > res0: Array[Unit] = Array((), (), (), (), (), ()) > > ``` > > > > But the same code in Zeppelin does not include output of the print > > statement at all :/ Tried both, 0.7.1 and master. > > > > ``` > > data.mapPartitions { _.map { println(_) } } collect > > res2: Array[Unit] = Array((), (), (), (), (), ()) > > ``` > > [image: Inline image 1] > > > > Is that expected or did I miss something? Please let me know if you have > > any ideas. > > > > -- > > Alex > > >