Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode
I've put the csv in the worker node since the job is run in the worker. I didn't put the csv in the master because I believe it doesn't run jobs. If I put the csv in the zeppelin node with the same path as the worker, it reads the csv and writes a _SUCCESS file locally. The job is run on the worker too but doesn't terminate. The result is saved under a _temporary directory in the worker. worker - ls -laRt /data/02.csv/ 02.csv/: total 0 drwxr-xr-x. 3 root root 24 Apr 28 09:55 . drwxr-xr-x. 3 root root 15 Apr 28 09:55 _temporary drwxr-xr-x. 3 root root 64 Apr 28 09:55 .. 02.csv/_temporary: total 0 drwxr-xr-x. 5 root root 106 Apr 28 09:56 0 drwxr-xr-x. 3 root root 15 Apr 28 09:55 . drwxr-xr-x. 3 root root 24 Apr 28 09:55 .. 02.csv/_temporary/0: total 0 drwxr-xr-x. 5 root root 106 Apr 28 09:56 . drwxr-xr-x. 2 root root 6 Apr 28 09:56 _temporary drwxr-xr-x. 2 root root 129 Apr 28 09:56 task_20170428095632_0005_m_00 drwxr-xr-x. 2 root root 129 Apr 28 09:55 task_20170428095516_0002_m_00 drwxr-xr-x. 3 root root 15 Apr 28 09:55 .. 02.csv/_temporary/0/_temporary: total 0 drwxr-xr-x. 2 root root 6 Apr 28 09:56 . drwxr-xr-x. 5 root root 106 Apr 28 09:56 .. 02.csv/_temporary/0/task_20170428095632_0005_m_00: total 52 drwxr-xr-x. 5 root root 106 Apr 28 09:56 .. -rw-r--r--. 1 root root 376 Apr 28 09:56 .part-0-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv.crc -rw-r--r--. 1 root root 46605 Apr 28 09:56 part-0-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv drwxr-xr-x. 2 root root 129 Apr 28 09:56 . 02.csv/_temporary/0/task_20170428095516_0002_m_00: total 52 drwxr-xr-x. 5 root root 106 Apr 28 09:56 .. -rw-r--r--. 1 root root 376 Apr 28 09:55 .part-0-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv.crc -rw-r--r--. 1 root root 46605 Apr 28 09:55 part-0-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv zeppelin - ls -laRt 02.csv/ 02.csv/: total 12 drwxr-sr-x2 root 1700 4096 Apr 28 09:56 . -rw-r--r--1 root 1700 8 Apr 28 09:56 ._SUCCESS.crc -rw-r--r--1 root 1700 0 Apr 28 09:56 _SUCCESS drwxrwsr-x5 root 1700 4096 Apr 28 09:56 .. El El mié, 10 may 2017 a las 14:06, Meethu Mathew <meethu.mat...@flytxt.com> escribió: > Try putting the csv in the same path in all the nodes or in a mount point > path which is accessible by all the nodes > > Regards, > > > Meethu Mathew > > > On Wed, May 10, 2017 at 3:36 PM, Sofiane Cherchalli <sofian...@gmail.com> > wrote: > >> Yes, I already tested with spark-shell and pyspark , with the same result. >> >> Can't I use Linux filesystem to read CSV, such as file:///data/file.csv. >> My understanding is that the job is sent and is interpreted in the worker, >> isn't it? >> >> Thanks. >> >> El El mar, 9 may 2017 a las 20:23, Jongyoul Lee <jongy...@gmail.com> >> escribió: >> >>> Could you test if it works with spark-shell? >>> >>> On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli <sofian...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I have a standalone cluster, one master and one worker, running in >>>> separate nodes. Zeppelin is running is in a separate node too in client >>>> mode. >>>> >>>> When I run a notebook that reads a CSV file located in the worker >>>> node with Spark-CSV package, Zeppelin tries to read the CSV locally and >>>> fails because the CVS is in the worker node and not in Zeppelin node. >>>> >>>> Is this the expected behavior? >>>> >>>> Thanks. >>>> >>> >>> >>> >>> -- >>> 이종열, Jongyoul Lee, 李宗烈 >>> http://madeng.net >>> >> >
Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode
Hi, I have a standalone cluster, one master and one worker, running in separate nodes. Zeppelin is running is in a separate node too in client mode. When I run a notebook that reads a CSV file located in the worker node with Spark-CSV package, Zeppelin tries to read the CSV locally and fails because the CVS is in the worker node and not in Zeppelin node. Is this the expected behavior? Thanks.
Re: Running a notebook in a standalone cluster mode issues
Hi Moon, Great, I am keen to see Zeppelin-2040 resolved soon. But meanwhile is there any workaround? Thanks. Sofiane El El mié, 3 may 2017 a las 20:40, moon soo Lee <m...@apache.org> escribió: > Zeppelin don't need to be installed in every workers. > You can think the way SparkInterpreter in Zeppelin work is very similar to > spark-shell (which works in client mode), until ZEPPELIN-2040 is resolved. > > Therefore, if spark-shell works in a machine with your standalone cluster, > Zeppelin will work in the same machine with the standalone cluster. > > Thanks, > moon > > On Wed, May 3, 2017 at 2:28 PM Sofiane Cherchalli <sofian...@gmail.com> > wrote: > >> Hi Moon, >> >> So in my case, if II have standalone or yarn cluster, the workaround >> would be to install zeppelin along every worker, proxy them, and run each >> zeppelin in client mode ? >> >> Thanks, >> Sofiane >> >> El El mié, 3 may 2017 a las 19:12, moon soo Lee <m...@apache.org> >> escribió: >> >>> Hi, >>> >>> Zeppelin does not support cluster mode deploy at the moment. >>> Fortunately, there will be a support for cluster mode, soon! >>> Please keep an eye on >>> https://issues.apache.org/jira/browse/ZEPPELIN-2040. >>> >>> Thanks, >>> moon >>> >>> On Wed, May 3, 2017 at 11:00 AM Sofiane Cherchalli <sofian...@gmail.com> >>> wrote: >>> >>>> Shall I configure a remote interpreter to my notebook to run on the >>>> worker? >>>> >>>> Mayday! >>>> >>>> On Wed, May 3, 2017 at 4:18 PM, Sofiane Cherchalli <sofian...@gmail.com >>>> > wrote: >>>> >>>>> What port does the remote interpreter use? >>>>> >>>>> On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli < >>>>> sofian...@gmail.com> wrote: >>>>> >>>>>> Hi Moon and al, >>>>>> >>>>>> I have a standalone cluster with one master, one worker. I submit >>>>>> jobs through zeppelin. master, worker, and zeppelin run in a separate >>>>>> container. >>>>>> >>>>>> My zeppelin-env.sh: >>>>>> >>>>>> # spark home >>>>>> export SPARK_HOME=/usr/local/spark >>>>>> >>>>>> # set hadoop conf dir >>>>>> export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop >>>>>> >>>>>> # set options to pass spark-submit command >>>>>> export SPARK_SUBMIT_OPTIONS="--packages >>>>>> com.databricks:spark-csv_2.11:1.5.0 --deploy-mode cluster" >>>>>> >>>>>> # worker memory >>>>>> export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g >>>>>> -Dspark.submit.deployMode=cluster" >>>>>> >>>>>> # master >>>>>> export MASTER="spark://:7077" >>>>>> >>>>>> My notebook code is very simple. It read csv and write it again in >>>>>> directory /data previously created: >>>>>> %spark.pyspark >>>>>> def read_input(fin): >>>>>> ''' >>>>>> Read input file from filesystem and return dataframe >>>>>> ''' >>>>>> df = sqlContext.read.load(fin, format='com.databricks.spark.csv', >>>>>> mode='PERMISSIVE', header='false', inferSchema='true') >>>>>> return df >>>>>> >>>>>> def write_output(df, fout): >>>>>> ''' >>>>>> Write dataframe to filesystem >>>>>> ''' >>>>>> >>>>>> df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',', >>>>>> header='true').save(fout) >>>>>> >>>>>> data_in = '/data/01.csv' >>>>>> data_out = '/data/02.csv' >>>>>> df = read_input(data_in) >>>>>> newdf = del_columns(df) >>>>>> write_output(newdf, data_out) >>>>>> >>>>>> >>>>>> I used --deploy-mode to *cluster* so that the driver is run in the >>>>>> worker in order to read the CSV in the /data directory and not in >>>>>> zeppelin. >>>>>> When running the notebook it complains that >>>>>> /opt/zeppe
Re: Running a notebook in a standalone cluster mode issues
Hi Moon, So in my case, if II have standalone or yarn cluster, the workaround would be to install zeppelin along every worker, proxy them, and run each zeppelin in client mode ? Thanks, Sofiane El El mié, 3 may 2017 a las 19:12, moon soo Lee <m...@apache.org> escribió: > Hi, > > Zeppelin does not support cluster mode deploy at the moment. Fortunately, > there will be a support for cluster mode, soon! > Please keep an eye on https://issues.apache.org/jira/browse/ZEPPELIN-2040. > > Thanks, > moon > > On Wed, May 3, 2017 at 11:00 AM Sofiane Cherchalli <sofian...@gmail.com> > wrote: > >> Shall I configure a remote interpreter to my notebook to run on the >> worker? >> >> Mayday! >> >> On Wed, May 3, 2017 at 4:18 PM, Sofiane Cherchalli <sofian...@gmail.com> >> wrote: >> >>> What port does the remote interpreter use? >>> >>> On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli <sofian...@gmail.com> >>> wrote: >>> >>>> Hi Moon and al, >>>> >>>> I have a standalone cluster with one master, one worker. I submit jobs >>>> through zeppelin. master, worker, and zeppelin run in a separate container. >>>> >>>> My zeppelin-env.sh: >>>> >>>> # spark home >>>> export SPARK_HOME=/usr/local/spark >>>> >>>> # set hadoop conf dir >>>> export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop >>>> >>>> # set options to pass spark-submit command >>>> export SPARK_SUBMIT_OPTIONS="--packages >>>> com.databricks:spark-csv_2.11:1.5.0 --deploy-mode cluster" >>>> >>>> # worker memory >>>> export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g >>>> -Dspark.submit.deployMode=cluster" >>>> >>>> # master >>>> export MASTER="spark://:7077" >>>> >>>> My notebook code is very simple. It read csv and write it again in >>>> directory /data previously created: >>>> %spark.pyspark >>>> def read_input(fin): >>>> ''' >>>> Read input file from filesystem and return dataframe >>>> ''' >>>> df = sqlContext.read.load(fin, format='com.databricks.spark.csv', >>>> mode='PERMISSIVE', header='false', inferSchema='true') >>>> return df >>>> >>>> def write_output(df, fout): >>>> ''' >>>> Write dataframe to filesystem >>>> ''' >>>> >>>> df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',', >>>> header='true').save(fout) >>>> >>>> data_in = '/data/01.csv' >>>> data_out = '/data/02.csv' >>>> df = read_input(data_in) >>>> newdf = del_columns(df) >>>> write_output(newdf, data_out) >>>> >>>> >>>> I used --deploy-mode to *cluster* so that the driver is run in the >>>> worker in order to read the CSV in the /data directory and not in zeppelin. >>>> When running the notebook it complains that >>>> /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is >>>> missing: >>>> org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache >>>> set to: /root/.ivy2/cache The jars for the packages stored in: >>>> /root/.ivy2/jars :: loading settings :: url = >>>> jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml >>>> com.databricks#spark-csv_2.11 added as a dependency :: resolving >>>> dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] >>>> found com.databricks#spark-csv_2.11;1.5.0 in central found >>>> org.apache.commons#commons-csv;1.1 in central found >>>> com.univocity#univocity-parsers;1.5.1 in central :: resolution report :: >>>> resolve 310ms :: artifacts dl 6ms :: modules in use: >>>> com.databricks#spark-csv_2.11;1.5.0 from central in [default] >>>> com.univocity#univocity-parsers;1.5.1 from central in [default] >>>> org.apache.commons#commons-csv;1.1 from central in [default] >>>> - | | >>>> modules || artifacts | | conf | number| search|dwnlded|evicted|| >>>> number|dwnlded| >>>> - | >>>> default | 3 | 0 | 0 | 0 || 3 | 0 | >>>>
Re: Running a notebook in a standalone cluster mode issues
What port does the remote interpreter use? On Wed, May 3, 2017 at 2:14 PM, Sofiane Cherchalli <sofian...@gmail.com> wrote: > Hi Moon and al, > > I have a standalone cluster with one master, one worker. I submit jobs > through zeppelin. master, worker, and zeppelin run in a separate container. > > My zeppelin-env.sh: > > # spark home > export SPARK_HOME=/usr/local/spark > > # set hadoop conf dir > export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop > > # set options to pass spark-submit command > export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.11:1.5.0 > --deploy-mode cluster" > > # worker memory > export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g > -Dspark.submit.deployMode=cluster" > > # master > export MASTER="spark://:7077" > > My notebook code is very simple. It read csv and write it again in > directory /data previously created: > %spark.pyspark > def read_input(fin): > ''' > Read input file from filesystem and return dataframe > ''' > df = sqlContext.read.load(fin, format='com.databricks.spark.csv', > mode='PERMISSIVE', header='false', inferSchema='true') > return df > > def write_output(df, fout): > ''' > Write dataframe to filesystem > ''' > > df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',', > header='true').save(fout) > > data_in = '/data/01.csv' > data_out = '/data/02.csv' > df = read_input(data_in) > newdf = del_columns(df) > write_output(newdf, data_out) > > > I used --deploy-mode to *cluster* so that the driver is run in the worker > in order to read the CSV in the /data directory and not in zeppelin. When > running the notebook it complains that /opt/zeppelin-0.7.1/ > interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is missing: > org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache > set to: /root/.ivy2/cache The jars for the packages stored in: > /root/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-2.1.0/ > jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml > com.databricks#spark-csv_2.11 added as a dependency :: resolving > dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] > found com.databricks#spark-csv_2.11;1.5.0 in central found > org.apache.commons#commons-csv;1.1 in central found > com.univocity#univocity-parsers;1.5.1 in central :: resolution report :: > resolve 310ms :: artifacts dl 6ms :: modules in use: > com.databricks#spark-csv_2.11;1.5.0 from central in [default] > com.univocity#univocity-parsers;1.5.1 from central in [default] > org.apache.commons#commons-csv;1.1 from central in [default] > - | | > modules || artifacts | | conf | number| search|dwnlded|evicted|| > number|dwnlded| -- > --- | default | 3 | 0 | 0 | 0 || 3 | > 0 | - > :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 > artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the > REST application submission protocol. SLF4J: Class path contains multiple > SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/ > interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/ > slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in > [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12- > 1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding > in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j- > log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See > http://www.slf4j.org/codes.html#multiple_bindings for an explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Warning: Master endpoint spark://spark-drone-master- > sofiane.autoetl.svc.cluster.local:7077 was not a REST server. Falling > back to legacy submission gateway instead. Ivy Default Cache set to: > /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars > com.databricks#spark-csv_2.11 added as a dependency :: resolving > dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] > found com.databricks#spark-csv_2.11;1.5.0 in central found > org.apache.commons#commons-csv;1.1 in central found > com.univocity#univocity-parsers;1.5.1 in central :: resolution report :: > resolve 69ms :: artifacts dl 5ms :: modules in use: > com.databricks#spark-csv_2.11;1.5.0 from central in [default] > com.univocity#univocity-parsers;1.5.1 from central in [default] > org.
Running a notebook in a standalone cluster mode issues
Hi Moon and al, I have a standalone cluster with one master, one worker. I submit jobs through zeppelin. master, worker, and zeppelin run in a separate container. My zeppelin-env.sh: # spark home export SPARK_HOME=/usr/local/spark # set hadoop conf dir export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop # set options to pass spark-submit command export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.11:1.5.0 --deploy-mode cluster" # worker memory export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g -Dspark.submit.deployMode=cluster" # master export MASTER="spark://:7077" My notebook code is very simple. It read csv and write it again in directory /data previously created: %spark.pyspark def read_input(fin): ''' Read input file from filesystem and return dataframe ''' df = sqlContext.read.load(fin, format='com.databricks.spark.csv', mode='PERMISSIVE', header='false', inferSchema='true') return df def write_output(df, fout): ''' Write dataframe to filesystem ''' df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',', header='true').save(fout) data_in = '/data/01.csv' data_out = '/data/02.csv' df = read_input(data_in) newdf = del_columns(df) write_output(newdf, data_out) I used --deploy-mode to *cluster* so that the driver is run in the worker in order to read the CSV in the /data directory and not in zeppelin. When running the notebook it complains that /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar is missing: org.apache.zeppelin.interpreter.InterpreterException: Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-2.1.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml com.databricks#spark-csv_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found com.databricks#spark-csv_2.11;1.5.0 in central found org.apache.commons#commons-csv;1.1 in central found com.univocity#univocity-parsers;1.5.1 in central :: resolution report :: resolve 310ms :: artifacts dl 6ms :: modules in use: com.databricks#spark-csv_2.11;1.5.0 from central in [default] com.univocity#univocity-parsers;1.5.1 from central in [default] org.apache.commons#commons-csv;1.1 from central in [default] - | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 3 | 0 | 0 | 0 || 3 | 0 | - :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 3 already retrieved (0kB/8ms) Running Spark using the REST application submission protocol. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/zeppelin-0.7.1/lib/interpreter/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Warning: Master endpoint spark://spark-drone-master-sofiane.autoetl.svc.cluster.local:7077 was not a REST server. Falling back to legacy submission gateway instead. Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars com.databricks#spark-csv_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found com.databricks#spark-csv_2.11;1.5.0 in central found org.apache.commons#commons-csv;1.1 in central found com.univocity#univocity-parsers;1.5.1 in central :: resolution report :: resolve 69ms :: artifacts dl 5ms :: modules in use: com.databricks#spark-csv_2.11;1.5.0 from central in [default] com.univocity#univocity-parsers;1.5.1 from central in [default] org.apache.commons#commons-csv;1.1 from central in [default] - | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 3 | 0 | 0 | 0 || 3 | 0 | - :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 3 already retrieved (0kB/4ms) java.nio.file.NoSuchFileException: /opt/zeppelin-0.7.1/interpreter/spark/zeppelin-spark_2.11-0.7.1.jar at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at
zeppelin 0.7.1 and Spark cluster standalone - Reading and writing csv
Hi, I have a spark cluster in standalone mode with one worker. Each of Zeppelin, spark master, and spark slave run in its own docker container. I am trying to read and write a csv from a notebook, but I'm having issues. First, my zeppelin-env.sh: # spark home export SPARK_HOME=/opt/spark-2.1.0 # set hadoop conf dir export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop # set options to pass spark-submit command export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.11:1.5.0" # worker memory export ZEPPELIN_JAVA_OPTS="-Dspark.driver.memory=7g" # master export MASTER="spark://master:7077" The notebook: %spark.pyspark data_in = '/data/01.csv' data_out = '/data/02.csv' def read_input(fin): ''' Read input file from filesystem and return dataframe ''' df = sqlContext.read.load(fin, format='com.databricks.spark.csv', mode='PERMISSIVE', header='false', inferSchema='true') return df def write_output(df, fout): ''' Write dataframe to filesystem ''' df.write.mode('overwrite').format('com.databricks.spark.csv').options(delimiter=',', header='true').save(fout) df = read_input(data_in) write_output(df, data_out) I copied the /data/01.csv file in the spark worker. When I run the notebook it fails complaining that the /data/01.csv was not found int the zeppelin container: Traceback (most recent call last): File "/opt/spark-2.1.0/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/spark-2.1.0/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o46.load. : org.apache.spark.sql.AnalysisException: Path does not exist: file:/data/01.csv; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/zeppelin_pyspark-5030675375956428180.py", line 337, in exec(code) File "", line 1, in File "", line 5, in read_input File "/opt/spark-2.1.0/python/pyspark/sql/readwriter.py", line 149, in load return self._df(self._jreader.load(path)) File "/opt/spark-2.1.0/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/opt/spark-2.1.0/python/pyspark/sql/utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.AnalysisException: 'Path does not exist: file:/data/01.csv;' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/tmp/zeppelin_pyspark-5030675375956428180.py", line 349, in raise Exception(traceback.format_exc()) Exception: Traceback (most recent call last): File "/opt/spark-2.1.0/python/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/spark-2.1.0/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o46.load. : org.apache.spark.sql.AnalysisException: Path does not exist: file:/data/01.csv; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at
Re: Helium - Algorithm
Thanks Moon. I'll have a look at it. El El mié, 19 abr 2017 a las 6:06, moon soo Lee <m...@apache.org> escribió: > Hi, > > If you take a look Helium Proof of Concept video [1] in the proposal [2], > you'll see one Helium app load data and then visualize data with another > Helium App (from 1min). > > So i would say it's totally possible, although we might need some > improvement to do it more smoothly. > > What do you think? > > Thanks, > moon > > [1] https://www.youtube.com/watch?time_continue=10=8Wdc70e6QVI > [2] https://cwiki.apache.org/confluence/display/ZEPPELIN/Helium+proposal > > On Tue, Apr 18, 2017 at 7:49 AM Sofiane Cherchalli <sofian...@gmail.com> > wrote: > >> Hi, >> >> Is it possible to use Helium to just implement algorithms without views? >> My idea is to have a catalog of algorithms that could be chained together. >> Each algorithm would read input from hdfs, process, and write output to >> hdfs. This could be very useful for data preprocessing. >> >> Any thought or suggestions about that? >> >> Thanks. >> Sofiane >> >
Helium - Algorithm
Hi, Is it possible to use Helium to just implement algorithms without views? My idea is to have a catalog of algorithms that could be chained together. Each algorithm would read input from hdfs, process, and write output to hdfs. This could be very useful for data preprocessing. Any thought or suggestions about that? Thanks. Sofiane
Re: Zeppelin Notebook API - reporting errors
Hi Moon, I just created the issue: ZEPPELIN-2345 <https://issues.apache.org/jira/browse/ZEPPELIN-2345> Best regards, Sofiane On Sat, Apr 1, 2017 at 1:53 AM, Sofiane Cherchalli <sofian...@gmail.com> wrote: > No problem. I'll film the issue. > > Thanks > > > El El vie, 31 mar 2017 a las 22:32, moon soo Lee <m...@apache.org> > escribió: > >> Thanks for the suggestion. >> >> As far as i know, there's no related issue in JIRA. >> Do you mind create one? >> >> Thanks, >> moon >> >> >> On Fri, Mar 31, 2017 at 2:49 AM Sofiane Cherchalli <sofian...@gmail.com> >> wrote: >> >> Any taker? Is this an issue or expected behaviour? >> >> Thanks. >> >> On Thu, Mar 30, 2017 at 9:52 PM, Sofiane Cherchalli <sofian...@gmail.com> >> wrote: >> >> Hi, >> >> I am running notebooks through the Notebook API by running synchronously >> every paragraph of the notebook, but it seems that if something fails >> during the execution of a paragraph due to an exception for instance, the >> API returns a 500 Server Error. To get more detail of the exception, one >> has to open the notebook to see the stack trace. Is it the expected >> behaviour? Shouldn't the API catch the exception and return the 500 with >> the stack trace text of the exception? >> >> Thanks. >> Sofiane >> >> >>
Re: Zeppelin Notebook API - reporting errors
Any taker? Is this an issue or expected behaviour? Thanks. On Thu, Mar 30, 2017 at 9:52 PM, Sofiane Cherchalli <sofian...@gmail.com> wrote: > Hi, > > I am running notebooks through the Notebook API by running synchronously > every paragraph of the notebook, but it seems that if something fails > during the execution of a paragraph due to an exception for instance, the > API returns a 500 Server Error. To get more detail of the exception, one > has to open the notebook to see the stack trace. Is it the expected > behaviour? Shouldn't the API catch the exception and return the 500 with > the stack trace text of the exception? > > Thanks. > Sofiane >
Angular FrontEnd API - Drag and Drop
Hi, Would it be possible to use angular front-end api to for example list the notebooks and display them in a zeppelin paragraph. Also, would it be possible to use drag and drop? Thanks Sofiane
Re: Release on 0.7.1 and 0.7.2
Hi Zeppelin team, What's the release forecast? Shall we expect 0.7.1 or 0.7.2 by Friday? Thanks. On Tue, 14 Mar 2017 at 13:09, Jianfeng (Jeff) Zhangwrote: +1 Best Regard, Jeff Zhang From: Jun Kim Reply-To: "users@zeppelin.apache.org" Date: Tuesday, March 14, 2017 at 11:38 AM To: "users@zeppelin.apache.org" Subject: Re: Release on 0.7.1 and 0.7.2 Cool! I look forward to it! 2017년 3월 14일 (화) 오후 12:31, moon soo Lee 님이 작성: Sounds like a plan! On Mon, Mar 13, 2017 at 8:22 PM Xiaohui Liu wrote: This is the right action. In fact, 0.7.0 release bin did not work for my team. We almost started to use 0.7.1-snapshot immediately after 0.7.0 release. I guess many of us are taking the same route. But for new zeppelin users, starting with 0.7.0 will give them the wrong first impression. On Tue, 14 Mar 2017 at 10:28 AM, Jongyoul Lee wrote: Hi dev and users, As we released 0.7.0, most of users and dev reported a lot of bugs which were critical. For the reason, community including me started to prepare new minor release with umbrella issue[1]. Due to contributors' efforts, we have resolved some of issues and have reviewed almost unresolved issues. I want to talk about the new minor release at this point. Generally, we have resolved all of issues reported as bugs before we release but some issues are very critical and it causes serious problem using Apache Zeppelin. Then I think, in this time, it's better to release 0.7.1 as soon as we can and prepare a new minor release with rest of unresolved issues. I'd like to start a process this Friday and if some issues are not merged until then, I hope they would be included in 0.7.2. Feel free to talk to me if you have a better plan to improve users' experiences. Regards, Jongyoul Lee [1] https://issues.apache.org/jira/browse/ZEPPELIN-2134 -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net -- Taejun Kim Data Mining Lab. School of Electrical and Computer Engineering University of Seoul
Re: Running all paragraphs with dynamic form's value parameters
After playing around running notes and paragraphs, it seems passing dynamic form's values works only with paragraphs but not with a notes. It would be nice to run a note with dynamic form's value and allowing paragraphs to override them. On Tue, Feb 28, 2017 at 8:33 PM, Sofiane Cherchalli <sofian...@gmail.com> wrote: > The API allows to run a paragraph with dynamic form's value. example: > > curl -sL -X POST -d "{'form_field': 'value'}" http://localhost:8080/api/ > notebook/run// > > Is it possible to run all paragraphs with dynamic form's value? For > instance: > curl -sL -X POST -d "{'form_field': 'value'}" http://localhost:8080/api/ > notebook/run/ > > I can't get it to work... Any hint? > > >
Running all paragraphs with dynamic form's value parameters
The API allows to run a paragraph with dynamic form's value. example: curl -sL -X POST -d "{'form_field': 'value'}" http://localhost:8080/api/notebook/run// Is it possible to run all paragraphs with dynamic form's value? For instance: curl -sL -X POST -d "{'form_field': 'value'}" http://localhost:8080/api/notebook/run/ I can't get it to work... Any hint?