Try setting spark.streaming.kafka.maxRatePerPartition, this can help control
the number of messages read from Kafka per partition on the spark streaming
consumer.
-S
> On Mar 5, 2016, at 10:02 PM, Vinti Maheshwari wrote:
>
> Hello,
>
> I am trying to figure out why my
Hello,
I have one table and 2 fields in it
1) item_id and
2) count
i want to add the count field as per item (means group the item_ids)
example
Input
itea_ID Count
500 2
200 6
500 4
100 3
200 6
Required Output
Result
Itea_id Count
500 6
200 12
100 3
I used the command the Resut=
Hello,
I am trying to figure out why my kafka+spark job is running slow. I found
that spark is consuming all the messages out of kafka into a single batch
itself and not sending any messages to the other batches.
2016/03/05 21:57:05
Hi, there
I hope someone can clarify this for me. It seems that some of the MLlib
algorithms such as KMean, Linear Regression and Logistics Regression have a
Streaming version, which can do online machine learning. But does that mean
other MLLib algorithm cannot be used in Spark streaming
Looking at the methods you call on HiveContext, they seem to belong
to SQLContext.
For SQLContext, you can use the below method of SQLContext in FirstQuery to
retrieve SQLContext:
def getOrCreate(sparkContext: SparkContext): SQLContext = {
FYI
On Sat, Mar 5, 2016 at 3:37 PM, Mich Talebzadeh
Ted, thanks for the reply.
Yeah, there were just three nodes with hdfs and spark workers colocated.
There were actually one more with spark master (standalone) and namenode.
And I've thrown one more spark worker node, which sees whole hdfs pretty
well, but doesn't have colocated datanode process.
bq. I haven't added one more HDFS node to a hadoop cluster
Does each of three nodes colocate with hdfs data nodes ?
The absence of 4th data node might have something to do with the partition
allocation.
Can you show your code snippet ?
Thanks
On Sat, Mar 5, 2016 at 2:54 PM, Eugene Morozov
Hi,
My cluster (standalone deployment) consisting of 3 worker nodes was in the
middle of computations, when I added one more worker node. I can see that
new worker is registered in master and that my job actually get one more
executor. I have configured default parallelism as 12 and thus I see
Will need more information to help you, what's the commands you used to launch
slave/master, and what error message did you see in the driver logs?
Tim
> On Mar 5, 2016, at 4:34 AM, Mailing List wrote:
>
> I am trying to do the same but till now no luck...
> I have
I dont know whats wrong but I can suggest looking up the source of the UDF
and debugging from there. I would think this is some JDK API cleveat and
not a Spark bug
--
Jan Sterba
https://twitter.com/honzasterba | http://flickr.com/honzasterba |
http://500px.com/honzasterba
On Fri, Mar 4, 2016 at
Hi,
I can use sbt to compile and run the following code. It works without any
problem.
I want to divide this into the obj and another class. I would like to do
the result set joining tables identified by Data Frame 'rs' and then calls
the method "firstquerym" in the class FirstQuery to do the
it's not safe to use direct committer with append mode, you may loose your
data..
On 4 March 2016 at 22:59, Jelez Raditchkov wrote:
> Working on a streaming job with DirectParquetOutputCommitter to S3
> I need to use PartitionBy and hence SaveMode.Append
>
> Apparently when
I have some code that stimulates task failure in the speculative mode.
The code i compile to jar and execute with
./bin/spark-submit --class com.test.SparkTest --jars --driver-memory 2g
--executor-memory 1g --master local[4] --conf spark.speculation=true
--conf spark.task.maxFailures=4
bq. reportError("Exception while streaming travis", e)
I assume there was none of the above in your job.
What Spark release are you using ?
Thanks
On Sat, Mar 5, 2016 at 4:57 AM, Dominik Safaric
wrote:
> Dear all,
>
> Lately, as a part of a scientific
Hi Team,
I am facing a issue while writing dataframe back to HIVE table.
When using "SaveMode.Overwrite" option the table is getting dropped and
Spark is unable to recreate it thus throwing error.
JIRA: https://issues.apache.org/jira/browse/SPARK-13699
E.g.
Hello Guys,
No help yet. Can someone tell me with a reply to the above question in SO ?
Thanks
Deepak
On Fri, Mar 4, 2016 at 5:32 PM, Deepak Gopalakrishnan
wrote:
> Have added this to SO, can you guys share any thoughts ?
>
>
>
Dear all,
Lately, as a part of a scientific research, I've been developing an
application that streams (or at least should) data from Travis CI and
GitHub, using their REST API's. The purpose of this is to get insight into
the commit-build relationship, in order to further perform numerous
I am trying to do the same but till now no luck...
I have everything running inside docker container including mesos master, mesos
slave , marathon , spark mesos cluster dispatcher.
But when I try to submit the job using spark submit as a docker container it
fails ...
Between this setup is on
Hey,
We had the same with Spark 1.5.x and disappeared after we upgraded to 1.6.
Tamas
On Saturday, 5 March 2016, SLiZn Liu wrote:
> Hi Spark Mailing List,
>
> I’m running terabytes of text files with Spark on Mesos, the job runs fine
> until we decided to switch to
Hi, Have a look at on
http://spark.apache.org/docs/latest/configuration.html what
ports need to be exposed. With mesos we had a lot of problems with
container networking but yes the --net=host is a shortcut.
Tamas
On 4 March 2016 at 22:37, yanlin wang wrote:
> We would like
20 matches
Mail list logo