Didn't reply to all. ---------- Forwarded message ---------- From: Aaron McCurry <[email protected]> Date: Fri, Oct 24, 2014 at 3:47 PM Subject: Re: Some Performance number of Spark Blur Connector To: Dibyendu Bhattacharya <[email protected]>
On Fri, Oct 24, 2014 at 2:19 PM, Dibyendu Bhattacharya < [email protected]> wrote: > Hi Aaron, > > here are some performance number between enqueue mutate and RDD > saveAsHadoopFile both using Spark Streaming. > > Set up I used not very optimized one , but can give a idea about both > method of indexing via Spark Streaming. > > I used 4 Node EMR M1.Xlarge cluster, and installed Blur as 1 Controller > and 3 Shard Server. My blur table has 9 partitions. > > On the same cluster, I was running Spark with 1 Master and 3 Worker. This > is not a good setup but anyway, here are the numbers. > > The enqueMutate index rate is around 800 messages / Second. > > The RDD saveAsHadoopFile index rate is around 12,000 message /second. > > This is few order of magnitude faster. > That's awesome, thank you for sharing! > > > Not sure if this is a issue with saveAsHadoopFile approach, but I can see > in Shard folder in HDFS has lots of small Lucene *.lnk files are getting > created ( probably for each saveAsHadoopFile call) and there are that many > "insue" folders as you see in screen shot. > > And these entries keep increasing to huge number if this Spark streaming > keep running for some time . Not sure if this has any impact on indexing > and search performance ? > They should be merged and removed over time however if there is a permission problem blur might not be able to remove the inuse folders. Also I have discovered and fixed a few resources releasing problems over the past few days that might be contributing to the issue. > > For enque mutate case, this types of folder structure not seen which is > understood . > > Both enque and saveAsHadoopFile code is here . > https://github.com/dibbhatt/spark-blur-connector. Will attach the latest > version to JIRA. > Thanks! Aaron > > > [image: Inline image 3] > > > [image: Inline image 2] >
