Thanks Cody for trying to understand the issue .
Sorry if I am not clear .
The scenario is to process all messages at once in single dstream block
when source system publishes messages .Source system will publish x
messages / 10 minutes once.
By events I meant that total
> Hi,
>
> My req is to find max value of revenue per customer so I am using below
> query. I got this solution from one of tutorial in google but not able to
> understand how it returns max in this scenario. can anyone hep
>
> revenuePerDayPerCustomerMap.reduceByKey((x, y) => (if(x._2 >=
Sent from my iPhone
> On Feb 6, 2016, at 4:41 PM, KhajaAsmath Mohammed
> wrote:
>
> Hi,
>
> My req is to find max value of revenue per customer so I am using below
> query. I got this solution from one of tutorial in google but not able to
> understand how it
I'm managing to read data via JDBC using the following but I can't work out
how to write something back to the Database.
df <- read.df(sqlContext, source="jdbc",
url="jdbc:mysql://hostname:3306?user=user=pass",
dbtable="database.table")
Does this functionality exist in 1.5.2?
Thanks,
>
> df <- read.df(sqlContext, source="jdbc",
> url="jdbc:mysql://hostname:3306?user=user=pass",
> dbtable="database.table")
>
I got a bit further but am now getting the following error. This error is
being thrown without the database being touched. I tested this by making
the database
The whole purpose of Apache mailing lists is that the messages get indexed
all over the web so that discussions and questions/solutions can be
searched easily by google and other engines.
For this reason, and the messages being sent via email as Steve pointed
out, it's just not possible to
Hi did anybody tried Spark Streaming with Druid as low latency store?
Combination seems powerful is it worth trying both together? Please guide
and share your experience. I am after creating the best low latency
streaming analytics.
--
View this message in context:
> On 5 Feb 2016, at 17:35, Marcelo Vanzin wrote:
>
> You don't... just send a new one.
>
> On Fri, Feb 5, 2016 at 9:33 AM, swetha kasireddy
> wrote:
>> Hi,
>>
>> I want to edit/delete a message posted in Spark User List. How do I do that?
>>
Sorry I realized that I left a bit of the last email.
This is the only BLOCKED thread in the dump. Refence handler is blocked
most likely due to the GC running at the moment of the dump.
"Reference Handler" daemon prio=10 tid=2 BLOCKED
at java.lang.Object.wait(Native Method)
at
I am not at all clear on what you are saying.
"Yes , I am printing each messages . It is processing all messages
under each dstream block." If it is processing all messages, what is the
problem you are having?
"The issue is with Directsream processing 10 message per event. " What
Igor,
Thank you for the response but unfortunately, the problem I'm referring to
goes beyond this. I have set the shuffle memory fraction to be 90% and set
the cache memory to be 0. Repartitioning the RDD helped a tad on the map
side but didn't do much for the spilling when there was no longer
Hi,
I did more investigation and found out that BLAS.scala is calling the native
reference architecture (f2jblas) for level 1 routines.
I even patched it to use nativeBlas.ddot but it has no material impact.
Dears
If I will use Kafka as a streaming source to some spark jobs, is it advised
to install spark to the same nodes of kafka cluster?
What are the benefits and drawbacks of such a decision?
regards
--
View this message in context:
Yes . To reduce network latency .
Sent from Samsung Mobile.
Original message From: fanooos
Date:07/02/2016 09:24 (GMT+05:30)
To: user@spark.apache.org Cc: Subject: Apache
Spark data locality when integrating with Kafka
Dears
If I will use
spark can benefit from data locality and will try to launch tasks on the
node where the kafka partition resides.
however i think in production many organizations run a dedicated kafka
cluster.
On Sat, Feb 6, 2016 at 11:27 PM, Diwakar Dhanuskodi <
diwakar.dhanusk...@gmail.com> wrote:
> Yes . To
Hi Spark Users Group,
I have a csv file to analysis with Spark, but I’m troubling with importing
as DataFrame.
Here’s the minimal reproducible example. Suppose I’m having a
*10(rows)x2(cols)* *space-delimited csv* file, shown as below:
1446566430 2015-11-0400:00:30
1446566430 2015-11-0400:00:30
Hi , I am getting the following error while reading the huge data from S3 and
after processing ,writing data to S3 again.
Did you find any solution for this ?
16/02/07 07:41:59 WARN scheduler.TaskSetManager: Lost task 144.2 in stage
3.0 (TID 169, ip-172-31-7-26.us-west-2.compute.internal):
Thank you ! Rui Sun for the observation! It helped.
I have a new problem arising. When I create a small function for dummy
variable creation for categorical column
BDADummies<-function(dataframe,column){
cat.column<-vector(mode="character",length=nrow(dataframe))
cat.column<-collect(column)
Cody,
Yes , I am printing each messages . It is processing all messages under
each dstream block.
Source systems are publishing 1 Million messages /4 secs which is less than
batch interval. The issue is with Directsream processing 10 message per
event. When partitions were
Hi,
usually you can solve this by 2 steps
make rdd to have more partitions
play with shuffle memory fraction
in spark 1.6 cache vs shuffle memory fractions are adjusted automatically
On 5 February 2016 at 23:07, Corey Nolet wrote:
> I just recently had a discovery that my
20 matches
Mail list logo