date:20170119

Re: freeing up memory occupied by processed Stream Blocks

2017-01-19 Thread Takeshi Yamamuro

Hi, AFAIK, the blocks of minibatch RDDs are checked every job finished, and older blocks automatically removed (See: https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L463 ). You can control this behaviour by

Re: How to do dashboard reporting in spark

2017-01-19 Thread Jörn Franke

You can use zeppelin if you want to directly interact with Spark. For traditional tools you have the right ideas (any of them works depending on requirements) See also lambda architecture > On 20 Jan 2017, at 08:18, Gaurav1809 wrote: > > Hi All, > > > Once data is

How to do dashboard reporting in spark

2017-01-19 Thread Gaurav1809

Hi All, Once data is stored in data frames? What's next? where do we go from there? Do we store data in Hive or any RDBMS(Oracle, MYSql, Teradata)? How to do the dashboard reporting based on the data present in dataframes. If there is any BI tool available in Spark Ecosystem, Please suggest.

Re: spark 2.02 error when writing to s3

2017-01-19 Thread Palash Gupta

Hi, You need to add mode overwrite option to avoid this error. //P.Gupta Sent from Yahoo Mail on Android On Fri, 20 Jan, 2017 at 2:15 am, VND Tremblay, Paul wrote: I have come across a problem when writing CSV files to S3 in Spark 2.02. The problem does not exist

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Oshadha Gunawardena

On Fri, Jan 20, 2017 at 11:26 AM, Gavin Yue wrote: > PST or est ? > > On Jan 19, 2017, at 21:55, ayan guha wrote: > > Sure...we will wait :) :) > > Just kidding > > On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 com>

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Gavin Yue

PST or est ? > On Jan 19, 2017, at 21:55, ayan guha wrote: > > Sure...we will wait :) :) > > Just kidding > >> On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 >> wrote: >> Get Outlook for Android >> Happiest Minds Disclaimer >> This

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread ayan guha

Sure...we will wait :) :) Just kidding On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 wrote: > Get Outlook for Android > -- > Happiest Minds Disclaimer > > This message is for the sole use of the intended

Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Manohar753

Get Outlook for Android Happiest Minds Disclaimer This message is for the sole use of the intended recipient(s) and may contain confidential, proprietary or legally privileged information. Any unauthorized review, use, disclosure or

Re: Spark Source Code Configuration

2017-01-19 Thread Deepu Raj

Thanks Kai, I am getting the message and its stuck when I run sbt any idea:- "Set current project to spark-parent (in build file:/home/cloudera/spark/)" Details attached. Regards, Deepu Raj +61 414 707 319 On Fri, 20 Jan 2017 10:27:16 +1100, Kai Jiang wrote: Hi

Re: spark 2.02 error when writing to s3

2017-01-19 Thread Takeshi Yamamuro

Hi, Do you get the same exception also in v2.1.0? Anyway, I saw another guy reporting the same error, I think. https://www.mail-archive.com/user@spark.apache.org/msg60882.html // maropu On Fri, Jan 20, 2017 at 5:15 AM, VND Tremblay, Paul wrote: > I have come across a

Re: Spark streaming app that processes Kafka DStreams produces no output and no error

2017-01-19 Thread shyla deshpande

There was a issue connecting to Kafka, once that was fixed the spark app works. Hope this helps someone. Thanks On Mon, Jan 16, 2017 at 7:58 AM, shyla deshpande wrote: > Hello, > I checked the log file on the worker node and don't see any error there. > This is the

Non-linear (curved?) regression line

2017-01-19 Thread Ganesh

Has anyone worked on non-linear/curved regression lines with Apache Spark? This seems to be such a trivial issue but I have given up after experimenting for nearly two weeks. The plot line is as below and the raw data in the table at the end. I just can't get Spark ML to give decent

Re: Executors - running out of memory

2017-01-19 Thread sanat kumar Patnaik

Please try and play with spark-defaults.conf for EMR. Dynamic allocation = true is there by default for EMR 4.4 and above. What is the EMR version you are using? http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#d0e20458 On Thu, Jan 19, 2017 at 5:02 PM, Venkata D

Re: Spark Source Code Configuration

2017-01-19 Thread Kai Jiang

Hi Deepu, Hope this page can give you some help. http://spark.apache.org/developer-tools.html Best, Kai On Thu, Jan 19, 2017, 14:51 Deepu Raj wrote: > Hi, > > Is there any article/Docs/support to set up Apache Spark source code on > Eclipse/InteliJ. > > I have tried

Spark Source Code Configuration

2017-01-19 Thread Deepu Raj

Hi, Is there any article/Docs/support to set up Apache Spark source code on Eclipse/InteliJ. I have tried setting up the source code, by importing into Git & using maven. I am getting lot of compilation errors. #suggestions Regards, Deepu Raj +61 414 707 319

Re: Executors - running out of memory

2017-01-19 Thread Venkata D

blondowski, How big is your JSON file. Is it possible to post the spark params or configurations here, maybe that might get to some idea about the issue. Thanks On Thu, Jan 19, 2017 at 4:21 PM, blondowski wrote: > Please bear with me..I'm fairly new to spark. Running

Re: How to save spark-ML model in Java?

2017-01-19 Thread Xiaomeng Wan

cv.fit is going to give you a CrossValidatorModel, if you want to extract the real model built. You need to do val cvModel = cv.fit(data) val plmodel = cvModel.bestModel.asInstanceOf[PipelineModel] val model = plmodel.stages(2).asInstanceOf[whatever_model] then you can model.save

Executors - running out of memory

2017-01-19 Thread blondowski

Please bear with me..I'm fairly new to spark. Running pyspark 2.0.1 on AWS EMR (6 node cluster with 475GB of RAM) We have a job that creates a dataframe from json files, then does some manipulation (adds columns) and then calls a UDF. The job fails on the UDF call with Container killed by YARN

dataset aggregators with kryo encoder very slow

2017-01-19 Thread Koert Kuipers

we just converted a job from RDD to Dataset. the job does a single map-red phase using aggregators. we are seeing very bad performance for the Dataset version, about 10x slower. in the Dataset version we use kryo encoders for some of the aggregators. based on some basic profiling of spark in

spark 2.02 error when writing to s3

2017-01-19 Thread VND Tremblay, Paul

I have come across a problem when writing CSV files to S3 in Spark 2.02. The problem does not exist in Spark 1.6. 19:09:20 Caused by: java.io.IOException: File already exists:s3://stx-apollo-pr-datascience-internal/revenue_model/part-r-00025-c48a0d52-9600-4495-913c-64ae6bf888bd.csv My code

Re: How to save spark-ML model in Java?

2017-01-19 Thread Minudika Malshan

Hi, Thanks Rezaul and Asher Krim. The method suggested by Rezaul works fine for NaiveBayes but still fails for RandomForest and Multi-layer perceptron classifier. Everything properly is saved until this stage. CrossValidator cv = new CrossValidator() .setEstimator(pipeline)

freeing up memory occupied by processed Stream Blocks

2017-01-19 Thread Andrew Milkowski

hello using spark 2.0.2 and while running sample streaming app with kinesis noticed (in admin ui Storage tab) "Stream Blocks" for each worker keeps climbing up then also (on same ui page) in Blocks section I see blocks such as below input-0-1484753367056 that are marked as Memory Serialized

Re: Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi

Thanks Sidney From: Sidney Feiner Sent: Thursday, January 19, 2017 9:52 AM To: jeff saremi Cc: user@spark.apache.org Subject: Re: Spark-submit: where do --files go? Every executor creates a directory with your submitted files and

Re: Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi

i wish someone added this to the documentation From: jeff saremi Sent: Thursday, January 19, 2017 9:56 AM To: Sidney Feiner Cc: user@spark.apache.org Subject: Re: Spark-submit: where do --files go? Thanks Sidney

Re: Spark-submit: where do --files go?

2017-01-19 Thread Sidney Feiner

Every executor creates a directory with your submitted files and you can access every file's absolute path them with the following: val fullFilePath = SparkFiles.get(fileName) On Jan 19, 2017 19:35, jeff saremi wrote: I'd like to know how -- From within Java/spark --

Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi

I'd like to know how -- From within Java/spark -- I can access the dependent files which i deploy using "--files" option on the command line?

[SparkStreaming] SparkStreaming not allowing to do parallelize within a transform operation to generate a new RDD

2017-01-19 Thread Nipun Arora

Hi All, Can anyone suggest any way to create and "add to an RDD" as I describe below in a transform operation. I found that the error I observe, goes away if I comment out "ssc.checkpoint()". However, I need checkpointing in later stages. I would really appreciate any help. Thanks Nipun

Re: Fw: Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Mich Talebzadeh

Thanks Kuan for insight. Much appreciated. Mich Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com

Re: Fw: Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Kuan Feng

Greetings, Dr Mich Talebzadeh, This is Kuan from IBM Platform team. Thank you for your interests in Platform Symphony and Spark product. I'm writing this mail to clarify the "EGO-YARN" in that blog post you were referring to. EGO is an enterprise resource orchestration component

Re: Old version of Spark [v1.2.0]

2017-01-19 Thread Luciano Resende

Download page has been updated, hopefully will make things easier in the future http://spark.apache.org/downloads.html On Mon, Jan 16, 2017 at 1:52 AM, Jacek Laskowski wrote: > Hi Ayan, > > Although my first reaction was "Why would anyone ever want to download > older

Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Mich Talebzadeh

Hi, IBM stating that when Yarn is integrated with IBM platform symphony, you have more control on your

Re: how to dynamic partition dataframe

2017-01-19 Thread Michal Šenkýř

Hi, You can pass Seqs as varargs in Scala using this syntax: df.partitionBy(seq: _*) Michal On 18.1.2017 03:23, lk_spark wrote: hi,all: I want partition data by reading a config file who tells me how to partition current input data. DataFrameWriter have a method named with :

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp

It’s since spark version 2.0.0, if you are using under the version, you can try the below code. result.write.format("csv").save(path) -- Hi, I tried the below code, as result.write.csv(home/Prasad/) It is not working, It says Error: value csv is not member of

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad

Hi, I tried the below code, as result.write.csv(home/Prasad/) It is not working, It says Error: value csv is not member of org.apache.spark.sql.DataFrameWriter. Regards Prasad On Thu, Jan 19, 2017 at 4:35 PM, smartzjp wrote: > Beacause the reduce number will be not one,

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Chetan Khatri

Connect with Bangalore - Spark Meetup group. On Thu, Jan 19, 2017 at 3:07 PM, Deepak Sharma wrote: > Yes. > I will be there before 4 PM . > Whats your contact number ? > Thanks > Deepak > > On Thu, Jan 19, 2017 at 2:38 PM, Sirisha Cheruvu > wrote: >

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp

Beacause the reduce number will be not one, so it will out put a fold on the HDFS, You can use “result.write.csv(foldPath)”. -- Hi, Can anyone please let us know how to write the output of the Spark SQL in Local and HDFS path using Scala code. Code :- scala> val result =

Re: "Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim

Thanks, Sean. I will explore online more. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html

Re: "Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Sean Owen

It's a message from Hadoop libs, not Spark. It can be safely ignored. It's just saying you haven't installed the additional (non-Apache-licensed) native libs that can accelerate some operations. This is something you can easily have read more about online. On Thu, Jan 19, 2017 at 10:57 AM Md.

"Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim

Hi All, I'm the getting the following WARNING while running Spark jobs in standalone mode: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Please note that I have configured the native path and the other ENV variables as follows: export

Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad

Hi, Can anyone please let us know how to write the output of the Spark SQL in Local and HDFS path using Scala code. *Code :-* scala> val result = sqlContext.sql("select empno , name from emp"); scala > result.show(); If I give the command result.show() then It will print the output in the

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Deepak Sharma

Yes. I will be there before 4 PM . Whats your contact number ? Thanks Deepak On Thu, Jan 19, 2017 at 2:38 PM, Sirisha Cheruvu wrote: > Are we meeting today?! > > On Jan 18, 2017 8:32 AM, "Sirisha Cheruvu" wrote: > >> Hi , >> >> Just thought of keeping my

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Sirisha Cheruvu

Are we meeting today?! On Jan 18, 2017 8:32 AM, "Sirisha Cheruvu" wrote: > Hi , > > Just thought of keeping my intention of working together with spark > developers who are also from bangalore so that we can brainstorm > togetherand work out solutions on our projects? > > >

Re: is partitionBy of DataFrameWriter supported in 1.6.x?

2017-01-19 Thread Takeshi Yamamuro

Hi, In v1.6.0, it seems spark has supported `partitionBy` for JSON, text, ORC and avro. So, this is a bug of documents. Actually, this bug was fixed in v1.6.1 (See: https://github.com/apache/spark/commit/1005ee396f74dc4fcf127613b65e1abdb7f1934c ) Also, AFAIK, this document only describes

43 matches

Mail list logo