Re: How to get the data url

2017-11-03 Thread 小野圭二
Thank you for your reply, jgp. The URL in my question was just a sample. Whatever it is. I mean, let's image multi user environment spark. This is just a case model. -I am a watcher on a spark system. -Some users run their applications on my spark and ,I need to know what URL running on my

unable to run spark streaming example

2017-11-03 Thread Imran Rajjad
I am trying out the network word count example and my unit test is producing the blow console output with an exception Exception in thread "dispatcher-event-loop-5" java.lang.NoClassDefFoundError: scala/runtime/AbstractPartialFunction$mcVL$sp at java.lang.ClassLoader.defineClass1(Native Method)

Re: Regarding column partitioning IDs and names as per hierarchical level SparkSQL

2017-11-03 Thread ayan guha
you can use 10 passes over the same dataset and build the data On Fri, Nov 3, 2017 at 9:48 PM, Jean Georges Perrin wrote: > Write a UDF? > > On Oct 31, 2017, at 11:48, Aakash Basu wrote: > > Hey all, > > Any help in the below please? > >

Re: pyspark configuration with Juyter

2017-11-03 Thread Jeff Zhang
You are setting PYSPARK_DRIVER to jupyter, please set it to python exec file anudeep 于2017年11月3日周五 下午7:31写道: > Hello experts, > > I install jupyter notebook thorugh anacoda, set the pyspark driver to use > jupyter notebook. > > I see the below issue when i try to open

pyspark configuration with Juyter

2017-11-03 Thread anudeep
Hello experts, I install jupyter notebook thorugh anacoda, set the pyspark driver to use jupyter notebook. I see the below issue when i try to open pyspark. anudeepg@datanode2 spark-2.1.0]$ ./bin/pyspark [I 07:29:53.184 NotebookApp] The port is already in use, trying another port. [I

Re: Regarding column partitioning IDs and names as per hierarchical level SparkSQL

2017-11-03 Thread Jean Georges Perrin
Write a UDF? > On Oct 31, 2017, at 11:48, Aakash Basu > wrote: > > Hey all, > > Any help in the below please? > > Thanks, > Aakash. > > > -- Forwarded message -- > From: Aakash Basu

Re: How to get the data url

2017-11-03 Thread Jean Georges Perrin
I am a little confused by your question… Are you trying to ingest a file from S3? If so… look for net.jgp.labs.spark on GitHub and look for net.jgp.labs.spark.l000_ingestion.l001_csv_in_progress.S3CsvToDataset You can modify the file as the keys are yours… If you want to download first: look

Re: Hi all,

2017-11-03 Thread Jean Georges Perrin
Hi Oren, Why don’t you want to use a GroupBy? You can cache or checkpoint the result and use it in your process, keeping everything in Spark and avoiding save/ingestion... > On Oct 31, 2017, at 08:17, ⁨אורן שמון⁩ <⁨oren.sha...@gmail.com⁩> wrote: > > I have 2 spark jobs one is pre-process and