Re: freeing up memory occupied by processed Stream Blocks

2017-01-19 Thread Takeshi Yamamuro
Hi, AFAIK, the blocks of minibatch RDDs are checked every job finished, and older blocks automatically removed (See: https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L463 ). You can control this behaviour by

Re: How to do dashboard reporting in spark

2017-01-19 Thread Jörn Franke
You can use zeppelin if you want to directly interact with Spark. For traditional tools you have the right ideas (any of them works depending on requirements) See also lambda architecture > On 20 Jan 2017, at 08:18, Gaurav1809 wrote: > > Hi All, > > > Once data is

How to do dashboard reporting in spark

2017-01-19 Thread Gaurav1809
Hi All, Once data is stored in data frames? What's next? where do we go from there? Do we store data in Hive or any RDBMS(Oracle, MYSql, Teradata)? How to do the dashboard reporting based on the data present in dataframes. If there is any BI tool available in Spark Ecosystem, Please suggest.

Re: spark 2.02 error when writing to s3

2017-01-19 Thread Palash Gupta
Hi, You need to add mode overwrite option to avoid this error. //P.Gupta Sent from Yahoo Mail on Android On Fri, 20 Jan, 2017 at 2:15 am, VND Tremblay, Paul wrote: I have come across a problem when writing CSV files to S3 in Spark 2.02. The problem does not exist

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Oshadha Gunawardena
On Fri, Jan 20, 2017 at 11:26 AM, Gavin Yue wrote: > PST or est ? > > On Jan 19, 2017, at 21:55, ayan guha wrote: > > Sure...we will wait :) :) > > Just kidding > > On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 com>

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Gavin Yue
PST or est ? > On Jan 19, 2017, at 21:55, ayan guha wrote: > > Sure...we will wait :) :) > > Just kidding > >> On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 >> wrote: >> Get Outlook for Android >> Happiest Minds Disclaimer >> This

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread ayan guha
Sure...we will wait :) :) Just kidding On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 wrote: > Get Outlook for Android > -- > Happiest Minds Disclaimer > > This message is for the sole use of the intended

Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Manohar753
Get Outlook for Android Happiest Minds Disclaimer This message is for the sole use of the intended recipient(s) and may contain confidential, proprietary or legally privileged information. Any unauthorized review, use, disclosure or

Re: Spark Source Code Configuration

2017-01-19 Thread Deepu Raj
Thanks Kai, I am getting the message and its stuck when I run sbt any idea:- "Set current project to spark-parent (in build file:/home/cloudera/spark/)" Details attached. Regards, Deepu Raj +61 414 707 319 On Fri, 20 Jan 2017 10:27:16 +1100, Kai Jiang wrote: Hi

Re: spark 2.02 error when writing to s3

2017-01-19 Thread Takeshi Yamamuro
Hi, Do you get the same exception also in v2.1.0? Anyway, I saw another guy reporting the same error, I think. https://www.mail-archive.com/user@spark.apache.org/msg60882.html // maropu On Fri, Jan 20, 2017 at 5:15 AM, VND Tremblay, Paul wrote: > I have come across a

Re: Spark streaming app that processes Kafka DStreams produces no output and no error

2017-01-19 Thread shyla deshpande
There was a issue connecting to Kafka, once that was fixed the spark app works. Hope this helps someone. Thanks On Mon, Jan 16, 2017 at 7:58 AM, shyla deshpande wrote: > Hello, > I checked the log file on the worker node and don't see any error there. > This is the

Non-linear (curved?) regression line

2017-01-19 Thread Ganesh
Has anyone worked on non-linear/curved regression lines with Apache Spark? This seems to be such a trivial issue but I have given up after experimenting for nearly two weeks. The plot line is as below and the raw data in the table at the end. I just can't get Spark ML to give decent

Re: Executors - running out of memory

2017-01-19 Thread sanat kumar Patnaik
Please try and play with spark-defaults.conf for EMR. Dynamic allocation = true is there by default for EMR 4.4 and above. What is the EMR version you are using? http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#d0e20458 On Thu, Jan 19, 2017 at 5:02 PM, Venkata D

Re: Spark Source Code Configuration

2017-01-19 Thread Kai Jiang
Hi Deepu, Hope this page can give you some help. http://spark.apache.org/developer-tools.html Best, Kai On Thu, Jan 19, 2017, 14:51 Deepu Raj wrote: > Hi, > > Is there any article/Docs/support to set up Apache Spark source code on > Eclipse/InteliJ. > > I have tried

Spark Source Code Configuration

2017-01-19 Thread Deepu Raj
Hi, Is there any article/Docs/support to set up Apache Spark source code on Eclipse/InteliJ. I have tried setting up the source code, by importing into Git & using maven. I am getting lot of compilation errors. #suggestions Regards, Deepu Raj +61 414 707 319

Re: Executors - running out of memory

2017-01-19 Thread Venkata D
blondowski, How big is your JSON file. Is it possible to post the spark params or configurations here, maybe that might get to some idea about the issue. Thanks On Thu, Jan 19, 2017 at 4:21 PM, blondowski wrote: > Please bear with me..I'm fairly new to spark. Running

Re: How to save spark-ML model in Java?

2017-01-19 Thread Xiaomeng Wan
cv.fit is going to give you a CrossValidatorModel, if you want to extract the real model built. You need to do val cvModel = cv.fit(data) val plmodel = cvModel.bestModel.asInstanceOf[PipelineModel] val model = plmodel.stages(2).asInstanceOf[whatever_model] then you can model.save

Executors - running out of memory

2017-01-19 Thread blondowski
Please bear with me..I'm fairly new to spark. Running pyspark 2.0.1 on AWS EMR (6 node cluster with 475GB of RAM) We have a job that creates a dataframe from json files, then does some manipulation (adds columns) and then calls a UDF. The job fails on the UDF call with Container killed by YARN

dataset aggregators with kryo encoder very slow

2017-01-19 Thread Koert Kuipers
we just converted a job from RDD to Dataset. the job does a single map-red phase using aggregators. we are seeing very bad performance for the Dataset version, about 10x slower. in the Dataset version we use kryo encoders for some of the aggregators. based on some basic profiling of spark in

spark 2.02 error when writing to s3

2017-01-19 Thread VND Tremblay, Paul
I have come across a problem when writing CSV files to S3 in Spark 2.02. The problem does not exist in Spark 1.6. 19:09:20 Caused by: java.io.IOException: File already exists:s3://stx-apollo-pr-datascience-internal/revenue_model/part-r-00025-c48a0d52-9600-4495-913c-64ae6bf888bd.csv My code

Re: How to save spark-ML model in Java?

2017-01-19 Thread Minudika Malshan
Hi, Thanks Rezaul and Asher Krim. The method suggested by Rezaul works fine for NaiveBayes but still fails for RandomForest and Multi-layer perceptron classifier. Everything properly is saved until this stage. CrossValidator cv = new CrossValidator() .setEstimator(pipeline)

freeing up memory occupied by processed Stream Blocks

2017-01-19 Thread Andrew Milkowski
hello using spark 2.0.2 and while running sample streaming app with kinesis noticed (in admin ui Storage tab) "Stream Blocks" for each worker keeps climbing up then also (on same ui page) in Blocks section I see blocks such as below input-0-1484753367056 that are marked as Memory Serialized

Re: Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi
Thanks Sidney From: Sidney Feiner Sent: Thursday, January 19, 2017 9:52 AM To: jeff saremi Cc: user@spark.apache.org Subject: Re: Spark-submit: where do --files go? Every executor creates a directory with your submitted files and

Re: Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi
i wish someone added this to the documentation From: jeff saremi Sent: Thursday, January 19, 2017 9:56 AM To: Sidney Feiner Cc: user@spark.apache.org Subject: Re: Spark-submit: where do --files go? Thanks Sidney

Re: Spark-submit: where do --files go?

2017-01-19 Thread Sidney Feiner
Every executor creates a directory with your submitted files and you can access every file's absolute path them with the following: val fullFilePath = SparkFiles.get(fileName) On Jan 19, 2017 19:35, jeff saremi wrote: I'd like to know how -- From within Java/spark --

Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi
I'd like to know how -- From within Java/spark -- I can access the dependent files which i deploy using "--files" option on the command line?

[SparkStreaming] SparkStreaming not allowing to do parallelize within a transform operation to generate a new RDD

2017-01-19 Thread Nipun Arora
Hi All, Can anyone suggest any way to create and "add to an RDD" as I describe below in a transform operation. I found that the error I observe, goes away if I comment out "ssc.checkpoint()". However, I need checkpointing in later stages. I would really appreciate any help. Thanks Nipun

Re: Fw: Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Mich Talebzadeh
Thanks Kuan for insight. Much appreciated. Mich Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com

Re: Fw: Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Kuan Feng
Greetings, Dr Mich Talebzadeh, This is Kuan from IBM Platform team. Thank you for your interests in Platform Symphony and Spark product. I'm writing this mail to clarify the "EGO-YARN" in that blog post you were referring to. EGO is an enterprise resource orchestration component

Re: Old version of Spark [v1.2.0]

2017-01-19 Thread Luciano Resende
Download page has been updated, hopefully will make things easier in the future http://spark.apache.org/downloads.html On Mon, Jan 16, 2017 at 1:52 AM, Jacek Laskowski wrote: > Hi Ayan, > > Although my first reaction was "Why would anyone ever want to download > older

Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Mich Talebzadeh
Hi, IBM stating that when Yarn is integrated with IBM platform symphony, you have more control on your

Re: how to dynamic partition dataframe

2017-01-19 Thread Michal Šenkýř
Hi, You can pass Seqs as varargs in Scala using this syntax: df.partitionBy(seq: _*) Michal On 18.1.2017 03:23, lk_spark wrote: hi,all: I want partition data by reading a config file who tells me how to partition current input data. DataFrameWriter have a method named with :

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp
It’s since spark version 2.0.0, if you are using under the version, you can try the below code. result.write.format("csv").save(path) -- Hi, I tried the below code, as result.write.csv(home/Prasad/) It is not working, It says Error: value csv is not member of

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad
Hi, I tried the below code, as result.write.csv(home/Prasad/) It is not working, It says Error: value csv is not member of org.apache.spark.sql.DataFrameWriter. Regards Prasad On Thu, Jan 19, 2017 at 4:35 PM, smartzjp wrote: > Beacause the reduce number will be not one,

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Chetan Khatri
Connect with Bangalore - Spark Meetup group. On Thu, Jan 19, 2017 at 3:07 PM, Deepak Sharma wrote: > Yes. > I will be there before 4 PM . > Whats your contact number ? > Thanks > Deepak > > On Thu, Jan 19, 2017 at 2:38 PM, Sirisha Cheruvu > wrote: >

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp
Beacause the reduce number will be not one, so it will out put a fold on the HDFS, You can use “result.write.csv(foldPath)”. -- Hi, Can anyone please let us know how to write the output of the Spark SQL in Local and HDFS path using Scala code. Code :- scala> val result =

Re: "Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim
Thanks, Sean. I will explore online more. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html

Re: "Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Sean Owen
It's a message from Hadoop libs, not Spark. It can be safely ignored. It's just saying you haven't installed the additional (non-Apache-licensed) native libs that can accelerate some operations. This is something you can easily have read more about online. On Thu, Jan 19, 2017 at 10:57 AM Md.

"Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim
Hi All, I'm the getting the following WARNING while running Spark jobs in standalone mode: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Please note that I have configured the native path and the other ENV variables as follows: export

Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad
Hi, Can anyone please let us know how to write the output of the Spark SQL in Local and HDFS path using Scala code. *Code :-* scala> val result = sqlContext.sql("select empno , name from emp"); scala > result.show(); If I give the command result.show() then It will print the output in the

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Deepak Sharma
Yes. I will be there before 4 PM . Whats your contact number ? Thanks Deepak On Thu, Jan 19, 2017 at 2:38 PM, Sirisha Cheruvu wrote: > Are we meeting today?! > > On Jan 18, 2017 8:32 AM, "Sirisha Cheruvu" wrote: > >> Hi , >> >> Just thought of keeping my

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Sirisha Cheruvu
Are we meeting today?! On Jan 18, 2017 8:32 AM, "Sirisha Cheruvu" wrote: > Hi , > > Just thought of keeping my intention of working together with spark > developers who are also from bangalore so that we can brainstorm > togetherand work out solutions on our projects? > > >

Re: is partitionBy of DataFrameWriter supported in 1.6.x?

2017-01-19 Thread Takeshi Yamamuro
Hi, In v1.6.0, it seems spark has supported `partitionBy` for JSON, text, ORC and avro. So, this is a bug of documents. Actually, this bug was fixed in v1.6.1 (See: https://github.com/apache/spark/commit/1005ee396f74dc4fcf127613b65e1abdb7f1934c ) Also, AFAIK, this document only describes