Hi,
I've implemented Twitter streaming as in the code given at the bottom of
email. It finds some tweets based on the hashtags I'm following. However,
it seems that a large amount of tweets is missing. I've tried to post some
tweets that I'm following in the application, and none of them was
Thanks for the suggestion Cheng, I will try that today.
Are there any implications when reading the parquet data if there are
no summary files present?
Michael
On Sat, Jul 25, 2015 at 2:28 AM, Cheng Lian lian.cs@gmail.com wrote:
The time is probably spent by ParquetOutputFormat.commitJob.
Its been added from spark 1.1.0 i guess
https://issues.apache.org/jira/browse/SPARK-1161
Thanks
Best Regards
On Sat, Jul 25, 2015 at 12:06 AM, Oren Shpigel o...@yowza3d.com wrote:
Sorry, I didn't mention I'm using the Python API, which doesn't have the
saveAsObjectFiles method.
Is there any
The thing is that the class it is complaining about is part of the spark
assembly jar, not in my extra jar. The assembly jar was compiled with
-Phive which is proven by the fact that it works with the same SPARK_HOME
when run as shell.
On 23 July 2015 at 17:33, Akhil Das
I don't think INSERT INTO is supported.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Insert-data-into-a-table-tp21898p23990.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hello,
I am new user of Spark, and need to know what could be the best practice to do
the following scenario :
- Spark Streaming receives XML messages from Kafka
- Spark transforms each message of the RDD (xml2json + some enrichments)
- Spark store the transformed/enriched messages inside
Hello all
New Spark user here. We've been looking at the Spark ecosystem to
build some new parts of our log processing pipeline.
The spark-dataflow project looks especially interesting.
The windowing and triggers concepts look like a good fit for what we
need to do: our log data going into
Use foreachPartition and batch the writes
On Sat, Jul 25, 2015 at 9:14 AM, nib...@free.fr wrote:
Hello,
I am new user of Spark, and need to know what could be the best practice
to do the following scenario :
- Spark Streaming receives XML messages from Kafka
- Spark transforms each message
1 - How to increase the level of *parallelism in spark streaming custom
RECEIVER* .
2 - Will ssc.receiverstream(/**anything //) will *delete the data
stored in spark memory using store(s) * logic .
--
Thanks Regards,
Anshu Shukla
Hello Spark community,
I currently have a Spark 1.3.1 batch driver, deployed in YARN-cluster mode
on an EMR cluster (AMI 3.7.0) that reads input data through an HiveContext,
in particular SELECTing data from an EXTERNAL TABLE backed on S3. Such
table has dynamic partitions and contains *hundreds
Hi I'm working with Spark Streaming using scala, and trying to figure out the
following problem. In my DStream[(int, int)], each record is an int pair
tuple. For each batch, I would like to filter out all records with first
integer below average of first integer in this batch, and for all records
I just wanted an easy step by step guide as to exactly what version of what
ever to download for a Proof of Concept installation of Apache Spark on
Windows 7. I have spent quite some time following a number of different
recipes to no avail. I have tried about 10 different permutations to date.
I
Hi
I have been using Spark for quite some time using either scala or python. I
wanted to give a try to groovy through scripts for small tests.
Unfortunately I get the following exception (using that simple script
https://gist.github.com/galleon/d6540327c418aa8a479f)
Is there anything I am not
My eventGen is emitting 20,000 events/sec ,and I am using store(s1)
in receive() method to push data to receiverStream .
But this logic is working fine for upto 4000 events/sec and no batch
are seen emitting for larger rate .
*CODE:TOPOLOGY -*
*JavaDStreamString sourcestream =
14 matches
Mail list logo