Re: [basex-talk] Distributed processing on roadmap ?

Mansi Sheth Thu, 20 Nov 2014 15:05:04 -0800

Sorry about the delay. I was busy preparing a presentation for my company
as baseX being a our analytics solution. It was very well received. All
thanks to you and everyone on this user list :)


Based on my use cases, I believe (again I am no expert in this domain),
map/reduce approach would work better. The result set being returned would
contain maximum couple of thousand records with some post-processing on it,
as compared to TBs of data being queried. If the querying and processing
step could use processing power from clusters of nodes, may be we might get
significant performance gain ? What are your thoughts ? What are other use
cases, you come across ?

- Mansi

On Mon, Nov 17, 2014 at 10:50 AM, Christian Grün <christian.gr...@gmail.com>
wrote:

> Hi Mansi,
>
> it's nice to hear that you have been successfully scaling your
> database instances so far.
>
> > I love using BaseX and the powers of BaseX. Currently I am able to query
> ~60GB of XML files under 2.5 mins. I still have a few more optimization a
> to try. I also do see this data increasing to a couple of TB shortly.
> >
> > I would love to see if this kind of processing is almost real time
> (within a min). So my question is there any discussions around supporting
> distributed processing or clusters of nodes etc ?
>
> Yes, distributed processing is a frequently discussed topic. One of
> our major questions is what challenge to solve first. As you surely
> know, there are so many different NoSQL stores out there, and all of
> them tackle different problems. Up to now, we spent most time on
> replication, but this would not give you better performance.
>
> So I would be interested to hear what kind of distribution techniques
> you believe would give you better performance. Do you think that a
> map/reduce approach would be helpful, or do you simply have lots of
> data that somehow needs to be sent to a client as quickly as possible?
> In other words, how large are your results sets? Do you really need
> the complete results, or would you rather like to draw some
> conclusions from the scanned data?
>
> Back to the current technology… Maybe you could do some Java profiling
> (using e.g. -Xrunhprof:cpu=samples) in order to find out what's the
> current bottleneck.
>
> Best,
> Christian
>



-- 
- Mansi

Re: [basex-talk] Distributed processing on roadmap ?

Reply via email to