date:20160206

Re: Kafka directsream receiving rate

2016-02-06 Thread Diwakar Dhanuskodi

Thanks Cody for trying to understand the issue . Sorry if I am not clear . The scenario is to process all messages at once in single dstream block when source system publishes messages .Source system will publish x messages / 10 minutes once. By events I meant that total

Fwd: Question on how to access tuple values in spark

2016-02-06 Thread mdkhajaasmath

> Hi, > > My req is to find max value of revenue per customer so I am using below > query. I got this solution from one of tutorial in google but not able to > understand how it returns max in this scenario. can anyone hep > > revenuePerDayPerCustomerMap.reduceByKey((x, y) => (if(x._2 >=

Re: Question on how to access tuple values in spark

2016-02-06 Thread mdkhajaasmath

Sent from my iPhone > On Feb 6, 2016, at 4:41 PM, KhajaAsmath Mohammed > wrote: > > Hi, > > My req is to find max value of revenue per customer so I am using below > query. I got this solution from one of tutorial in google but not able to > understand how it

Writing to jdbc database from SparkR (1.5.2)

2016-02-06 Thread Andrew Holway

I'm managing to read data via JDBC using the following but I can't work out how to write something back to the Database. df <- read.df(sqlContext, source="jdbc", url="jdbc:mysql://hostname:3306?user=user=pass", dbtable="database.table") Does this functionality exist in 1.5.2? Thanks,

Re: Writing to jdbc database from SparkR (1.5.2)

2016-02-06 Thread Andrew Holway

> > df <- read.df(sqlContext, source="jdbc", > url="jdbc:mysql://hostname:3306?user=user=pass", > dbtable="database.table") > I got a bit further but am now getting the following error. This error is being thrown without the database being touched. I tested this by making the database

Re: Help needed in deleting a message posted in Spark User List

2016-02-06 Thread Corey Nolet

The whole purpose of Apache mailing lists is that the messages get indexed all over the web so that discussions and questions/solutions can be searched easily by google and other engines. For this reason, and the messages being sent via email as Steve pointed out, it's just not possible to

Spark Streaming with Druid?

2016-02-06 Thread unk1102

Hi did anybody tried Spark Streaming with Druid as low latency store? Combination seems powerful is it worth trying both together? Please guide and share your experience. I am after creating the best low latency streaming analytics. -- View this message in context:

Re: Help needed in deleting a message posted in Spark User List

2016-02-06 Thread Steve Loughran

> On 5 Feb 2016, at 17:35, Marcelo Vanzin wrote: > > You don't... just send a new one. > > On Fri, Feb 5, 2016 at 9:33 AM, swetha kasireddy > wrote: >> Hi, >> >> I want to edit/delete a message posted in Spark User List. How do I do that? >>

Re: Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

2016-02-06 Thread Udo Fholl

Sorry I realized that I left a bit of the last email. This is the only BLOCKED thread in the dump. Refence handler is blocked most likely due to the GC running at the moment of the dump. "Reference Handler" daemon prio=10 tid=2 BLOCKED at java.lang.Object.wait(Native Method) at

Re: Kafka directsream receiving rate

2016-02-06 Thread Cody Koeninger

I am not at all clear on what you are saying. "Yes , I am printing each messages . It is processing all messages under each dstream block." If it is processing all messages, what is the problem you are having? "The issue is with Directsream processing 10 message per event. " What

Re: Shuffle memory woes

2016-02-06 Thread Corey Nolet

Igor, Thank you for the response but unfortunately, the problem I'm referring to goes beyond this. I have set the shuffle memory fraction to be 90% and set the cache memory to be 0. Repartitioning the RDD helped a tad on the map side but didn't do much for the spilling when there was no longer

Re: Slowness in Kmeans calculating fastSquaredDistance

2016-02-06 Thread Li Ming Tsai

Hi, I did more investigation and found out that BLAS.scala is calling the native reference architecture (f2jblas) for level 1 routines. I even patched it to use nativeBlas.ddot but it has no material impact.

Apache Spark data locality when integrating with Kafka

2016-02-06 Thread fanooos

Dears If I will use Kafka as a streaming source to some spark jobs, is it advised to install spark to the same nodes of kafka cluster? What are the benefits and drawbacks of such a decision? regards -- View this message in context:

RE: Apache Spark data locality when integrating with Kafka

2016-02-06 Thread Diwakar Dhanuskodi

Yes . To reduce network latency . Sent from Samsung Mobile. Original message From: fanooos Date:07/02/2016 09:24 (GMT+05:30) To: user@spark.apache.org Cc: Subject: Apache Spark data locality when integrating with Kafka Dears If I will use

Re: Apache Spark data locality when integrating with Kafka

2016-02-06 Thread Koert Kuipers

spark can benefit from data locality and will try to launch tasks on the node where the kafka partition resides. however i think in production many organizations run a dedicated kafka cluster. On Sat, Feb 6, 2016 at 11:27 PM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > Yes . To

Imported CSV file content isn't identical to the original file

2016-02-06 Thread SLiZn Liu

Hi Spark Users Group, I have a csv file to analysis with Spark, but I’m troubling with importing as DataFrame. Here’s the minimal reproducible example. Suppose I’m having a *10(rows)x2(cols)* *space-delimited csv* file, shown as below: 1446566430 2015-11-0400:00:30 1446566430 2015-11-0400:00:30

Re: Bad Digest error while doing aws s3 put

2016-02-06 Thread Dhimant

Hi , I am getting the following error while reading the huge data from S3 and after processing ,writing data to S3 again. Did you find any solution for this ? 16/02/07 07:41:59 WARN scheduler.TaskSetManager: Lost task 144.2 in stage 3.0 (TID 169, ip-172-31-7-26.us-west-2.compute.internal):

Re: different behavior while using createDataFrame and read.df in SparkR

2016-02-06 Thread Devesh Raj Singh

Thank you ! Rui Sun for the observation! It helped. I have a new problem arising. When I create a small function for dummy variable creation for categorical column BDADummies<-function(dataframe,column){ cat.column<-vector(mode="character",length=nrow(dataframe)) cat.column<-collect(column)

Re: Kafka directsream receiving rate

2016-02-06 Thread Diwakar Dhanuskodi

Cody, Yes , I am printing each messages . It is processing all messages under each dstream block. Source systems are publishing 1 Million messages /4 secs which is less than batch interval. The issue is with Directsream processing 10 message per event. When partitions were

Re: Shuffle memory woes

2016-02-06 Thread Igor Berman

Hi, usually you can solve this by 2 steps make rdd to have more partitions play with shuffle memory fraction in spark 1.6 cache vs shuffle memory fractions are adjusted automatically On 5 February 2016 at 23:07, Corey Nolet wrote: > I just recently had a discovery that my

Re: Kafka directsream receiving rate

Fwd: Question on how to access tuple values in spark

Re: Question on how to access tuple values in spark

Writing to jdbc database from SparkR (1.5.2)

Re: Writing to jdbc database from SparkR (1.5.2)

Re: Help needed in deleting a message posted in Spark User List

Spark Streaming with Druid?

Re: Help needed in deleting a message posted in Spark User List

Re: Spark Streaming - 1.6.0: mapWithState Kinesis huge memory usage

Re: Kafka directsream receiving rate

Re: Shuffle memory woes

Re: Slowness in Kmeans calculating fastSquaredDistance

Apache Spark data locality when integrating with Kafka

RE: Apache Spark data locality when integrating with Kafka

Re: Apache Spark data locality when integrating with Kafka

Imported CSV file content isn't identical to the original file

Re: Bad Digest error while doing aws s3 put

Re: different behavior while using createDataFrame and read.df in SparkR

Re: Kafka directsream receiving rate

Re: Shuffle memory woes

20 matches

Site Navigation

Mail list logo

Footer information