Re: Slow Mongo Read from Spark

2015-09-03 Thread Deepesh Maheshwari
to improve performance > > Le jeu. 3 sept. 2015 à 9:15, Akhil Das <ak...@sigmoidanalytics.com> a > écrit : > >> On SSD you will get around 30-40MB/s on a single machine (on 4 cores). >> >> Thanks >> Best Regards >> >> On Mon, Aug 31, 2015 at 3:13

Slow Mongo Read from Spark

2015-08-31 Thread Deepesh Maheshwari
Hi, I am trying to read mongodb in Spark newAPIHadoopRDD. / Code */ config.set("mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat"); config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI); config.set("mongo.input.query","{host: 'abc.com'}"); JavaSparkContext sc=new

Write Concern used in Mongo-Hadoop Connector

2015-08-31 Thread Deepesh Maheshwari
Hi, I am using below code to insert data in mongodb from spark. JavaPairRDD rdd; Configuration config = new Configuration(); config.set("mongo.output.uri", SparkProperties.MONGO_OUTPUT_URI); config.set("mongo.output.format", "com.mongodb.hadoop.MongoOutputFormat");

Re: Slow Mongo Read from Spark

2015-08-31 Thread Deepesh Maheshwari
set("mongo.output.uri", mongodbUri); > > JavaPairRDD<Object,BSONObject> bsonRatingsData = > sc.newAPIHadoopFile( > ratingsUri, BSONFileInputFormat.class, Object.class, > BSONObject.class, bsonDataConfig); > > > Thanks > B

reduceByKey not working on JavaPairDStream

2015-08-26 Thread Deepesh Maheshwari
Hi, I have applied mapToPair and then a reduceByKey on a DStream to obtain a JavaPairDStreamString, MapString, Object. I have to apply a flatMapToPair and reduceByKey on the DSTream Obtained above. But i do not see any logs from reduceByKey operation. Can anyone explain why is this happening..?

Custom Offset Management

2015-08-26 Thread Deepesh Maheshwari
Hi Folks, My Spark application interacts with kafka for getting data through Java Api. I am using Direct Approach (No Receivers) - which use Kafka’s simple consumer API to Read data. So, kafka offsets need to be handles explicitly. In case of Spark failure i need to save the offset state of

persist for DStream

2015-08-20 Thread Deepesh Maheshwari
Hi, there are function available tp cache() or persist() RDD in memory but i am reading data from kafka in form of DStream and applying operation it and i want to persist that DStream in memory for further. Please suggest method how i can persist DStream in memory. Regards, Deepesh

How to Handle Update Operation from Spark to MongoDB

2015-08-12 Thread Deepesh Maheshwari
Hi, I am using MongoDb -Hadoop connector to insert RDD into mongodb. rdd.saveAsNewAPIHadoopFile(file:///notapplicable, Object.class, BSONObject.class, MongoOutputFormat.class, outputConfig); But, some operation required to insert rdd data as update operation for Mongo

Error while output JavaDStream to disk and mongodb

2015-08-11 Thread Deepesh Maheshwari
Hi, I have successfully reduced my data and store it in JavaDStreamBSONObject Now, i want to save this data in mongodb for this i have used BSONObject type. But, when i try to save it, it is giving exception. For this, i also try to save it just as *saveAsTextFile *but same exception. Error

Debugging Spark job in Eclipse

2015-08-05 Thread Deepesh Maheshwari
Hi, As spark job is executed when you run start() method of JavaStreamingContext. All the job like map, flatMap is already defined earlier but even though you put breakpoints in the function ,breakpoint doesn't stop there , then how can i debug the spark jobs. JavaDStreamString

Re: Unable to load native-hadoop library for your platform

2015-08-04 Thread Deepesh Maheshwari
. Spark doesn't necessarily use these anyway; it's from the Hadoop libs. On Tue, Aug 4, 2015 at 8:30 AM, Deepesh Maheshwari deepesh.maheshwar...@gmail.com wrote: Can you elaborate about the things this native library covering. One you mentioned accelerated compression. It would be very

Re: Unable to load native-hadoop library for your platform

2015-08-04 Thread Deepesh Maheshwari
. It just means you haven't installed and configured native libraries for things like accelerated compression, but it has no negative impact otherwise. On Tue, Aug 4, 2015 at 8:11 AM, Deepesh Maheshwari deepesh.maheshwar...@gmail.com wrote: Hi, When i run the spark locally on windows it gives

NoSuchMethodError : org.apache.spark.streaming.scheduler.StreamingListenerBus.start()V

2015-08-04 Thread Deepesh Maheshwari
Hi, I am trying to read data from kafka and process it using spark. i have attached my source code , error log. For integrating kafka, i have added dependency in pom.xml dependency groupIdorg.apache.spark/groupId artifactIdspark-streaming_2.10/artifactId

Unable to load native-hadoop library for your platform

2015-08-04 Thread Deepesh Maheshwari
Hi, When i run the spark locally on windows it gives below hadoop library error. I am using below spark version. dependency groupIdorg.apache.spark/groupId artifactIdspark-core_2.10/artifactId version1.4.1/version /dependency 2015-08-04 12:22:23,463

Transform MongoDB Aggregation into Spark Job

2015-08-04 Thread Deepesh Maheshwari
Hi, I am new to Apache Spark and exploring spark+kafka intergration to process data using spark which i did earlier in MongoDB Aggregation. I am not able to figure out to handle my use case. Mongo Document : { _id : ObjectId(55bfb3285e90ecbfe37b25c3), url :