Didn't reply to all.

---------- Forwarded message ----------
From: Aaron McCurry <[email protected]>
Date: Fri, Oct 24, 2014 at 3:47 PM
Subject: Re: Some Performance number of Spark Blur Connector
To: Dibyendu Bhattacharya <[email protected]>




On Fri, Oct 24, 2014 at 2:19 PM, Dibyendu Bhattacharya <
[email protected]> wrote:

> Hi Aaron,
>
> here are some performance number between enqueue mutate and RDD
> saveAsHadoopFile both using Spark Streaming.
>
> Set up I used not very optimized one , but can give a idea about both
> method of indexing via Spark Streaming.
>
> I used 4 Node EMR M1.Xlarge cluster, and installed Blur as 1 Controller
> and 3 Shard Server. My blur table has 9 partitions.
>
> On the same cluster, I was running Spark with 1 Master and 3 Worker. This
> is not a good setup but anyway, here are the numbers.
>
> The enqueMutate index rate is around 800 messages / Second.
>
> The RDD saveAsHadoopFile index rate is around 12,000 message /second.
>
> This is few order of magnitude faster.
>

That's awesome, thank you for sharing!


>
>
> Not sure if this is a issue with saveAsHadoopFile approach, but I can see
> in Shard folder in HDFS has lots of small Lucene *.lnk files are getting
> created ( probably for each saveAsHadoopFile call) and there are that many
> "insue" folders as you see in screen shot.
>
> And these entries keep increasing to huge number  if this Spark streaming
> keep running for some time . Not sure if this has any impact on indexing
> and search performance ?
>

They should be merged and removed over time however if there is a
permission problem blur might not be able to remove the inuse folders.
Also I have discovered and fixed a few resources releasing problems over
the past few days that might be contributing to the issue.


>
> For enque mutate case, this types of folder structure not seen which is
> understood .
>
> Both enque and saveAsHadoopFile code is here .
> https://github.com/dibbhatt/spark-blur-connector. Will attach the latest
> version to JIRA.
>

Thanks!

Aaron

>
>
> [image: Inline image 3]
>
>
> [image: Inline image 2]
>

Reply via email to