to improve performance
>
> Le jeu. 3 sept. 2015 à 9:15, Akhil Das <ak...@sigmoidanalytics.com> a
> écrit :
>
>> On SSD you will get around 30-40MB/s on a single machine (on 4 cores).
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Aug 31, 2015 at 3:13
Hi, I am trying to read mongodb in Spark newAPIHadoopRDD.
/ Code */
config.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat");
config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI);
config.set("mongo.input.query","{host: 'abc.com'}");
JavaSparkContext sc=new
Hi,
I have applied mapToPair and then a reduceByKey on a DStream to obtain a
JavaPairDStreamString, MapString, Object.
I have to apply a flatMapToPair and reduceByKey on the DSTream Obtained
above.
But i do not see any logs from reduceByKey operation.
Can anyone explain why is this happening..?
Hi Folks,
My Spark application interacts with kafka for getting data through Java Api.
I am using Direct Approach (No Receivers) - which use Kafka’s simple
consumer API to Read data.
So, kafka offsets need to be handles explicitly.
In case of Spark failure i need to save the offset state of
Hi,
there are function available tp cache() or persist() RDD in memory but i am
reading data from kafka in form of DStream and applying operation it and i
want to persist that DStream in memory for further.
Please suggest method how i can persist DStream in memory.
Regards,
Deepesh
Hi,
I am using MongoDb -Hadoop connector to insert RDD into mongodb.
rdd.saveAsNewAPIHadoopFile(file:///notapplicable,
Object.class, BSONObject.class,
MongoOutputFormat.class, outputConfig);
But, some operation required to insert rdd data as update operation for
Mongo
Hi,
I have successfully reduced my data and store it in JavaDStreamBSONObject
Now, i want to save this data in mongodb for this i have used BSONObject
type.
But, when i try to save it, it is giving exception.
For this, i also try to save it just as *saveAsTextFile *but same exception.
Error
Hi,
As spark job is executed when you run start() method of
JavaStreamingContext.
All the job like map, flatMap is already defined earlier but even though
you put breakpoints in the function ,breakpoint doesn't stop there , then
how can i debug the spark jobs.
JavaDStreamString
. Spark doesn't necessarily use these
anyway; it's from the Hadoop libs.
On Tue, Aug 4, 2015 at 8:30 AM, Deepesh Maheshwari
deepesh.maheshwar...@gmail.com wrote:
Can you elaborate about the things this native library covering.
One you mentioned accelerated compression.
It would be very
. It just means you haven't installed and
configured native libraries for things like accelerated compression,
but it has no negative impact otherwise.
On Tue, Aug 4, 2015 at 8:11 AM, Deepesh Maheshwari
deepesh.maheshwar...@gmail.com wrote:
Hi,
When i run the spark locally on windows it gives
Hi,
I am trying to read data from kafka and process it using spark.
i have attached my source code , error log.
For integrating kafka,
i have added dependency in pom.xml
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-streaming_2.10/artifactId
Hi,
When i run the spark locally on windows it gives below hadoop library error.
I am using below spark version.
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-core_2.10/artifactId
version1.4.1/version
/dependency
2015-08-04 12:22:23,463
Hi,
I am new to Apache Spark and exploring spark+kafka intergration to process
data using spark which i did earlier in MongoDB Aggregation.
I am not able to figure out to handle my use case.
Mongo Document :
{
_id : ObjectId(55bfb3285e90ecbfe37b25c3),
url :