Re: [akka-user] Can I use in-memory Actor state as a search engine ?

2016-03-31 Thread Guido Medina
Hi Chelios,

I didn't mean to use any external engine or Riak, I meant that you should 
learn what they do so that you can get ideas from these frameworks/engines.

The thing with map reduce is the ability to split data sets among nodes 
where a query is the sum of the filtered dataset per node, so how are going 
to index your data per node?
The answer to that? sharding, once an actor starts on a node, each actor 
will tell the map reduce engine per node (your engine), I'm residing here, 
index me.

Do you get the idea now?
I don't think distributed data is ideal for that because you need to split 
the work load among nodes, the filtering task can be heavy hence you need 
something that partition and rebuild the index data
for you in case of a node going down, hence sharding is ideal, say you have 
the following:

node 1 has items 1, 2 and 3
node 2 has items 4, 5 and 6
node 3 has items 7, 8 and 9

each item has indexable properties like name, description etc.

so, how do you query? you send the query to each node to a query 
coordinator which will prepare the result and send it back, but you send 
such query to each node and each node answer back to you right?
then you concatenate that result and do something else, by you sharding you 
are automatically scaling, see my points now?

HTH,

Guido.

On Thursday, March 31, 2016 at 3:35:26 AM UTC+1, Chelios wrote:
>
> Hey Guido,
>
> Thanks heaps for this info. I only have small theoretical experience with 
> map reduce. I will have to study on the info you gave me. 
>
> The reason I thought of not using any external database is because I'm 
> trying to get every small Actor (Customer, Product etc) manage it's on 
> small piece of data and live anywhere on the cluster. Hoping this will get 
> rid the problem of sharding and partitioning the database. If I used Riak, 
> the data will be living in Riak instead of the Actors I instantiated in my 
> application and I'm trying to manage the data by myself. I'm not sure if 
> this is a good idea or not. But your comments are helping me. 
>
> Apache Crunch looks great, may be there is a Scala client for this. I will 
> read up on it more.
>
>
>
>
> On Wednesday, March 30, 2016 at 9:53:51 PM UTC+11, Guido Medina wrote:
>>
>> Even if you want to do it yourself you still have to reduce data from a 
>> map, there are papers if you want to create your own implementation of a 
>> "map reduce engine"
>> You won't escape that fact if you want your implementation to be 
>> competitive, take a look at Riak, they do the same in Erlang, they have 
>> actors too, and they still have to use BloomFilters from Google.
>>
>> They all basically copies of the same paper which basically tells you 
>> ways to reduce data very fast using well known hashing techniques.
>>
>> Guido.
>>
>> On Wednesday, March 30, 2016 at 11:49:24 AM UTC+1, Guido Medina wrote:
>>>
>>> Hi Chelios,
>>>
>>> The problem you are solving is divided in two and I think it has been 
>>> resolved before though it is quite complex but if you divide and conquer it 
>>> might turn out to be easy.
>>> IMHO here are the main aspects of your problem:
>>>
>>>- Your data is distributed, each node with data will return the 
>>>result to the node querying it.
>>>- A query coordinator actor (one of these has to live on each node 
>>>for the sake of saving round-trips) will send such query to each node 
>>> and 
>>>expect a list of "map reduced" result.
>>>
>>> The key is to "map reduce", I'm assuming you first want to get the list 
>>> of actors that comply with your search criteria and then once you have such 
>>> list do something with them or via "them"
>>> In that case you want a map reduce in-memory data structure per node 
>>> holding data, assuming each node as a list of workers to parallel-ize the 
>>> query the rest is simple:
>>>
>>> Some ideas in the following link: 
>>> http://www.infoq.com/articles/ApacheCrunch
>>>
>>> HTH,
>>>
>>> Guido.
>>>
>>> On Wednesday, March 30, 2016 at 10:09:04 AM UTC+1, Chelios wrote:

 Hi Konrad,

 Your reply gave me the confidence to continue with implementing the 
 Actor based search. Thank You :D ... I'm doing this just for research 
 purposes, I'm just trying to see if I can get a high performant, 
 distributed, in-memory system by just using Eventsourcing with Akka 
 Actors without using any other external database or tool, other than an 
 Eventstore database.  

 Can I also attend the workshop ? Seeing something Actor design patterns 
 to designing a search engine architecture is something I need to learn for 
 this :)

 Cheers,
 Chel

 On Wednesday, March 30, 2016 at 7:29:33 PM UTC+11, Konrad Malawski 
 wrote:
>
> Technically it's doable , but I'm not sure if that'll reduce 
> complexity :-)
> Search really has to be "good" in order to be useful, just "fast but 
> bad results" often won't satisfy anyone,
> thus 

Re: [akka-user] Can I use in-memory Actor state as a search engine ?

2016-03-31 Thread Guido Medina
Hi Cherios,

I didn't mean to use any external engine or Riak, I meant that you should 
learn what they do so that you can get ideas from these frameworks/engines.

The thing with map reduce is the ability to split data sets among nodes 
where a query is the sum of the filtered dataset per node, so how are going 
to indexes of your data per node?
The answer to that? sharding, once an actor starts on a node, each actor 
will tell the map reduce engine (your engine), I'm residing here, index me.

Do you get the idea now?
I don't think distributed data is ideal for that because you need to split 
the work load among nodes, the filtering task can be heavy hence you need 
something that partition and rebuild the index data
for you in case of a node going down, hence sharding is ideal, say you have 
the following:

node 1 has items 1, 2 and 3
node 2 has items 4, 5 and 6
node 3 has items 7, 8 and 9

each item has indexable properties like name, description etc.

so, how do you query? you send the query to each node to a query 
coordinator which will prepare the result and send it back, but you send 
such query to each node and each node answer back to you right?
then you concatenate that result and do something else, by you sharding you 
are automatically scaling, see my points now?

HTH,

Guido.

On Thursday, March 31, 2016 at 3:35:26 AM UTC+1, Chelios wrote:
>
> Hey Guido,
>
> Thanks heaps for this info. I only have small theoretical experience with 
> map reduce. I will have to study on the info you gave me. 
>
> The reason I thought of not using any external database is because I'm 
> trying to get every small Actor (Customer, Product etc) manage it's on 
> small piece of data and live anywhere on the cluster. Hoping this will get 
> rid the problem of sharding and partitioning the database. If I used Riak, 
> the data will be living in Riak instead of the Actors I instantiated in my 
> application and I'm trying to manage the data by myself. I'm not sure if 
> this is a good idea or not. But your comments are helping me. 
>
> Apache Crunch looks great, may be there is a Scala client for this. I will 
> read up on it more.
>
>
>
>
> On Wednesday, March 30, 2016 at 9:53:51 PM UTC+11, Guido Medina wrote:
>>
>> Even if you want to do it yourself you still have to reduce data from a 
>> map, there are papers if you want to create your own implementation of a 
>> "map reduce engine"
>> You won't escape that fact if you want your implementation to be 
>> competitive, take a look at Riak, they do the same in Erlang, they have 
>> actors too, and they still have to use BloomFilters from Google.
>>
>> They all basically copies of the same paper which basically tells you 
>> ways to reduce data very fast using well known hashing techniques.
>>
>> Guido.
>>
>> On Wednesday, March 30, 2016 at 11:49:24 AM UTC+1, Guido Medina wrote:
>>>
>>> Hi Chelios,
>>>
>>> The problem you are solving is divided in two and I think it has been 
>>> resolved before though it is quite complex but if you divide and conquer it 
>>> might turn out to be easy.
>>> IMHO here are the main aspects of your problem:
>>>
>>>- Your data is distributed, each node with data will return the 
>>>result to the node querying it.
>>>- A query coordinator actor (one of these has to live on each node 
>>>for the sake of saving round-trips) will send such query to each node 
>>> and 
>>>expect a list of "map reduced" result.
>>>
>>> The key is to "map reduce", I'm assuming you first want to get the list 
>>> of actors that comply with your search criteria and then once you have such 
>>> list do something with them or via "them"
>>> In that case you want a map reduce in-memory data structure per node 
>>> holding data, assuming each node as a list of workers to parallel-ize the 
>>> query the rest is simple:
>>>
>>> Some ideas in the following link: 
>>> http://www.infoq.com/articles/ApacheCrunch
>>>
>>> HTH,
>>>
>>> Guido.
>>>
>>> On Wednesday, March 30, 2016 at 10:09:04 AM UTC+1, Chelios wrote:

 Hi Konrad,

 Your reply gave me the confidence to continue with implementing the 
 Actor based search. Thank You :D ... I'm doing this just for research 
 purposes, I'm just trying to see if I can get a high performant, 
 distributed, in-memory system by just using Eventsourcing with Akka 
 Actors without using any other external database or tool, other than an 
 Eventstore database.  

 Can I also attend the workshop ? Seeing something Actor design patterns 
 to designing a search engine architecture is something I need to learn for 
 this :)

 Cheers,
 Chel

 On Wednesday, March 30, 2016 at 7:29:33 PM UTC+11, Konrad Malawski 
 wrote:
>
> Technically it's doable , but I'm not sure if that'll reduce 
> complexity :-)
> Search really has to be "good" in order to be useful, just "fast but 
> bad results" often won't satisfy anyone,
> thus I'm 

Re: [akka-user] Can I use in-memory Actor state as a search engine ?

2016-03-30 Thread Patrik Nordwall
You may also want to take a look at Akka Distributed Data
http://doc.akka.io/docs/akka/2.4.2/scala/distributed-data.html for
replicating the search index (small enough amount of data I guess).

/Patrik
tors 31 mars 2016 kl. 04:35 skrev Chelios :

> Hey Guido,
>
> Thanks heaps for this info. I only have small theoretical experience with
> map reduce. I will have to study on the info you gave me.
>
> The reason I thought of not using any external database is because I'm
> trying to get every small Actor (Customer, Product etc) manage it's on
> small piece of data and live anywhere on the cluster. Hoping this will get
> rid the problem of sharding and partitioning the database. If I used Riak,
> the data will be living in Riak instead of the Actors I instantiated in my
> application and I'm trying to manage the data by myself. I'm not sure if
> this is a good idea or not. But your comments are helping me.
>
> Apache Crunch looks great, may be there is a Scala client for this. I will
> read up on it more.
>
>
>
>
> On Wednesday, March 30, 2016 at 9:53:51 PM UTC+11, Guido Medina wrote:
>>
>> Even if you want to do it yourself you still have to reduce data from a
>> map, there are papers if you want to create your own implementation of a
>> "map reduce engine"
>> You won't escape that fact if you want your implementation to be
>> competitive, take a look at Riak, they do the same in Erlang, they have
>> actors too, and they still have to use BloomFilters from Google.
>>
>> They all basically copies of the same paper which basically tells you
>> ways to reduce data very fast using well known hashing techniques.
>>
>> Guido.
>>
>> On Wednesday, March 30, 2016 at 11:49:24 AM UTC+1, Guido Medina wrote:
>>>
>>> Hi Chelios,
>>>
>>> The problem you are solving is divided in two and I think it has been
>>> resolved before though it is quite complex but if you divide and conquer it
>>> might turn out to be easy.
>>> IMHO here are the main aspects of your problem:
>>>
>>>- Your data is distributed, each node with data will return the
>>>result to the node querying it.
>>>- A query coordinator actor (one of these has to live on each node
>>>for the sake of saving round-trips) will send such query to each node and
>>>expect a list of "map reduced" result.
>>>
>>> The key is to "map reduce", I'm assuming you first want to get the list
>>> of actors that comply with your search criteria and then once you have such
>>> list do something with them or via "them"
>>> In that case you want a map reduce in-memory data structure per node
>>> holding data, assuming each node as a list of workers to parallel-ize the
>>> query the rest is simple:
>>>
>>> Some ideas in the following link:
>>> http://www.infoq.com/articles/ApacheCrunch
>>>
>>> HTH,
>>>
>>> Guido.
>>>
>>> On Wednesday, March 30, 2016 at 10:09:04 AM UTC+1, Chelios wrote:

 Hi Konrad,

 Your reply gave me the confidence to continue with implementing the
 Actor based search. Thank You :D ... I'm doing this just for research
 purposes, I'm just trying to see if I can get a high performant,
 distributed, in-memory system by just using Eventsourcing with Akka
 Actors without using any other external database or tool, other than an
 Eventstore database.

 Can I also attend the workshop ? Seeing something Actor design patterns
 to designing a search engine architecture is something I need to learn for
 this :)

 Cheers,
 Chel

 On Wednesday, March 30, 2016 at 7:29:33 PM UTC+11, Konrad Malawski
 wrote:
>
> Technically it's doable , but I'm not sure if that'll reduce
> complexity :-)
> Search really has to be "good" in order to be useful, just "fast but
> bad results" often won't satisfy anyone,
> thus I'm not sure implementing your own custom search engine is a good
> idea (unless that is exactly the goal of your business
> – be a search engine).
>
> A fun fact, one of the workshops I do is basically that, a multi-tier
> search engine architecture, however it depends if your entire job is to
> build the search engine, or you just should use an out of the box one
> because it's one of the 100 things you work on :-)
>
> --
> Cheers,
> Konrad 'ktoso’ Malawski
> Akka  @ Lightbend
> 
> 
>
> On 30 March 2016 at 08:42:57, Chelios (chelios@gmail.com) wrote:
>
> Hey guys
>
> I've got an Eventsource based application (Not CQRS - Read and write
> are both on the write side). The state of all the 
> entities/aggregates/actor
> are stored in memory because the data is not going to go above 120GB and
> I've have a machine with 265GB RAM.
>
> *Problem:*
> Suppose I have a million Products where each *Product* is an Actor
> supervised by *ProductSupervisorActor* 

Re: [akka-user] Can I use in-memory Actor state as a search engine ?

2016-03-30 Thread Chelios
Hey Guido,

Thanks heaps for this info. I only have small theoretical experience with 
map reduce. I will have to study on the info you gave me. 

The reason I thought of not using any external database is because I'm 
trying to get every small Actor (Customer, Product etc) manage it's on 
small piece of data and live anywhere on the cluster. Hoping this will get 
rid the problem of sharding and partitioning the database. If I used Riak, 
the data will be living in Riak instead of the Actors I instantiated in my 
application and I'm trying to manage the data by myself. I'm not sure if 
this is a good idea or not. But your comments are helping me. 

Apache Crunch looks great, may be there is a Scala client for this. I will 
read up on it more.




On Wednesday, March 30, 2016 at 9:53:51 PM UTC+11, Guido Medina wrote:
>
> Even if you want to do it yourself you still have to reduce data from a 
> map, there are papers if you want to create your own implementation of a 
> "map reduce engine"
> You won't escape that fact if you want your implementation to be 
> competitive, take a look at Riak, they do the same in Erlang, they have 
> actors too, and they still have to use BloomFilters from Google.
>
> They all basically copies of the same paper which basically tells you ways 
> to reduce data very fast using well known hashing techniques.
>
> Guido.
>
> On Wednesday, March 30, 2016 at 11:49:24 AM UTC+1, Guido Medina wrote:
>>
>> Hi Chelios,
>>
>> The problem you are solving is divided in two and I think it has been 
>> resolved before though it is quite complex but if you divide and conquer it 
>> might turn out to be easy.
>> IMHO here are the main aspects of your problem:
>>
>>- Your data is distributed, each node with data will return the 
>>result to the node querying it.
>>- A query coordinator actor (one of these has to live on each node 
>>for the sake of saving round-trips) will send such query to each node and 
>>expect a list of "map reduced" result.
>>
>> The key is to "map reduce", I'm assuming you first want to get the list 
>> of actors that comply with your search criteria and then once you have such 
>> list do something with them or via "them"
>> In that case you want a map reduce in-memory data structure per node 
>> holding data, assuming each node as a list of workers to parallel-ize the 
>> query the rest is simple:
>>
>> Some ideas in the following link: 
>> http://www.infoq.com/articles/ApacheCrunch
>>
>> HTH,
>>
>> Guido.
>>
>> On Wednesday, March 30, 2016 at 10:09:04 AM UTC+1, Chelios wrote:
>>>
>>> Hi Konrad,
>>>
>>> Your reply gave me the confidence to continue with implementing the 
>>> Actor based search. Thank You :D ... I'm doing this just for research 
>>> purposes, I'm just trying to see if I can get a high performant, 
>>> distributed, in-memory system by just using Eventsourcing with Akka 
>>> Actors without using any other external database or tool, other than an 
>>> Eventstore database.  
>>>
>>> Can I also attend the workshop ? Seeing something Actor design patterns 
>>> to designing a search engine architecture is something I need to learn for 
>>> this :)
>>>
>>> Cheers,
>>> Chel
>>>
>>> On Wednesday, March 30, 2016 at 7:29:33 PM UTC+11, Konrad Malawski wrote:

 Technically it's doable , but I'm not sure if that'll reduce complexity 
 :-)
 Search really has to be "good" in order to be useful, just "fast but 
 bad results" often won't satisfy anyone,
 thus I'm not sure implementing your own custom search engine is a good 
 idea (unless that is exactly the goal of your business 
 – be a search engine).

 A fun fact, one of the workshops I do is basically that, a multi-tier 
 search engine architecture, however it depends if your entire job is to 
 build the search engine, or you just should use an out of the box one 
 because it's one of the 100 things you work on :-)

 -- 
 Cheers,
 Konrad 'ktoso’ Malawski
 Akka  @ Lightbend 
 

 On 30 March 2016 at 08:42:57, Chelios (chelios@gmail.com) wrote:

 Hey guys 

 I've got an Eventsource based application (Not CQRS - Read and write 
 are both on the write side). The state of all the 
 entities/aggregates/actor 
 are stored in memory because the data is not going to go above 120GB and 
 I've have a machine with 265GB RAM.

 *Problem:*
 Suppose I have a million Products where each *Product* is an Actor 
 supervised by *ProductSupervisorActor* and I want to perform the 
 following query:
 *Query*: Find all the products where the *product description* matches 
 some user input.

 I'm wondering if I could get away with just querying the state of the 
 million actors and aggregating the result into one 
 *SearchRequestHandlerActor* instead of using a search database like 

Re: [akka-user] Can I use in-memory Actor state as a search engine ?

2016-03-30 Thread Guido Medina
Even if you want to do it yourself you still have to reduce data from a 
map, there are papers if you want to create your own implementation of a 
"map reduce engine"
You won't escape that fact if you want your implementation to be 
competitive, take a look at Riak, they do the same in Erlang, they have 
actors too, and they still have to use BloomFilters from Google.

They all basically copies of the same paper which basically tells you ways 
to reduce data very fast using well known hashing techniques.

Guido.

On Wednesday, March 30, 2016 at 11:49:24 AM UTC+1, Guido Medina wrote:
>
> Hi Chelios,
>
> The problem you are solving is divided in two and I think it has been 
> resolved before though it is quite complex but if you divide and conquer it 
> might turn out to be easy.
> IMHO here are the main aspects of your problem:
>
>- Your data is distributed, each node with data will return the result 
>to the node querying it.
>- A query coordinator actor (one of these has to live on each node for 
>the sake of saving round-trips) will send such query to each node and 
>expect a list of "map reduced" result.
>
> The key is to "map reduce", I'm assuming you first want to get the list of 
> actors that comply with your search criteria and then once you have such 
> list do something with them or via "them"
> In that case you want a map reduce in-memory data structure per node 
> holding data, assuming each node as a list of workers to parallel-ize the 
> query the rest is simple:
>
> Some ideas in the following link: 
> http://www.infoq.com/articles/ApacheCrunch
>
> HTH,
>
> Guido.
>
> On Wednesday, March 30, 2016 at 10:09:04 AM UTC+1, Chelios wrote:
>>
>> Hi Konrad,
>>
>> Your reply gave me the confidence to continue with implementing the Actor 
>> based search. Thank You :D ... I'm doing this just for research purposes, 
>> I'm just trying to see if I can get a high performant, distributed, 
>> in-memory system by just using Eventsourcing with Akka Actors without using 
>> any other external database or tool, other than an Eventstore database.  
>>
>> Can I also attend the workshop ? Seeing something Actor design patterns 
>> to designing a search engine architecture is something I need to learn for 
>> this :)
>>
>> Cheers,
>> Chel
>>
>> On Wednesday, March 30, 2016 at 7:29:33 PM UTC+11, Konrad Malawski wrote:
>>>
>>> Technically it's doable , but I'm not sure if that'll reduce complexity 
>>> :-)
>>> Search really has to be "good" in order to be useful, just "fast but bad 
>>> results" often won't satisfy anyone,
>>> thus I'm not sure implementing your own custom search engine is a good 
>>> idea (unless that is exactly the goal of your business 
>>> – be a search engine).
>>>
>>> A fun fact, one of the workshops I do is basically that, a multi-tier 
>>> search engine architecture, however it depends if your entire job is to 
>>> build the search engine, or you just should use an out of the box one 
>>> because it's one of the 100 things you work on :-)
>>>
>>> -- 
>>> Cheers,
>>> Konrad 'ktoso’ Malawski
>>> Akka  @ Lightbend 
>>> 
>>>
>>> On 30 March 2016 at 08:42:57, Chelios (chelios@gmail.com) wrote:
>>>
>>> Hey guys 
>>>
>>> I've got an Eventsource based application (Not CQRS - Read and write are 
>>> both on the write side). The state of all the entities/aggregates/actor are 
>>> stored in memory because the data is not going to go above 120GB and I've 
>>> have a machine with 265GB RAM.
>>>
>>> *Problem:*
>>> Suppose I have a million Products where each *Product* is an Actor 
>>> supervised by *ProductSupervisorActor* and I want to perform the 
>>> following query:
>>> *Query*: Find all the products where the *product description* matches 
>>> some user input.
>>>
>>> I'm wondering if I could get away with just querying the state of the 
>>> million actors and aggregating the result into one 
>>> *SearchRequestHandlerActor* instead of using a search database like 
>>> SOLR ? I've used SOLR before and it's super fast but I'm just trying to 
>>> reduce the complexity in my application. If the state is already in memory 
>>> may be I can just find a way to query it instead of introducing another 
>>> moving part (SOLR) into the system that I have to manage and make sure that 
>>> the data is synchronized.
>>>
>>> I would really like to find a solution to perform the above query 
>>> efficiently by just using Actors with paging. If I can achieve this then I 
>>> can have *ProductActor*s running anywhere in a cluster and the search 
>>> would work just fine. Instead, if I was using SOLR I would have to shard or 
>>> partition the database which just another hassle.
>>>
>>> RIght now I've got a *ProductSearchRequestHandlerActor* which*,* on 
>>> initilization, accepts *totalNumberOfMessagesExpect: Long* and accepts 
>>> messages of type *Option[ProductState]* until the 
>>> *totalNumberOfMessagesExpect* is 

Re: [akka-user] Can I use in-memory Actor state as a search engine ?

2016-03-30 Thread Guido Medina
Hi Chelios,

The problem you are solving is divided in two and I think it has been 
resolved before though it is quite complex but if you divide and conquer it 
might turn out to be easy.
IMHO here are the main aspects of your problem:

   - Your data is distributed, each node with data will return the result 
   to the node querying it.
   - A query coordinator actor (one of these has to live on each node for 
   the sake of saving round-trips) will send such query to each node and 
   expect a list of "map reduced" result.

The key is to "map reduce", I'm assuming you first want to get the list of 
actors that comply with your search criteria and then once you have such 
list do something with them or via "them"
In that case you want a map reduce in-memory data structure per node 
holding data, assuming each node as a list of workers to parallel-ize the 
query the rest is simple:

Some ideas in the following link: http://www.infoq.com/articles/ApacheCrunch

HTH,

Guido.

On Wednesday, March 30, 2016 at 10:09:04 AM UTC+1, Chelios wrote:
>
> Hi Konrad,
>
> Your reply gave me the confidence to continue with implementing the Actor 
> based search. Thank You :D ... I'm doing this just for research purposes, 
> I'm just trying to see if I can get a high performant, distributed, 
> in-memory system by just using Eventsourcing with Akka Actors without using 
> any other external database or tool, other than an Eventstore database.  
>
> Can I also attend the workshop ? Seeing something Actor design patterns to 
> designing a search engine architecture is something I need to learn for 
> this :)
>
> Cheers,
> Chel
>
> On Wednesday, March 30, 2016 at 7:29:33 PM UTC+11, Konrad Malawski wrote:
>>
>> Technically it's doable , but I'm not sure if that'll reduce complexity 
>> :-)
>> Search really has to be "good" in order to be useful, just "fast but bad 
>> results" often won't satisfy anyone,
>> thus I'm not sure implementing your own custom search engine is a good 
>> idea (unless that is exactly the goal of your business 
>> – be a search engine).
>>
>> A fun fact, one of the workshops I do is basically that, a multi-tier 
>> search engine architecture, however it depends if your entire job is to 
>> build the search engine, or you just should use an out of the box one 
>> because it's one of the 100 things you work on :-)
>>
>> -- 
>> Cheers,
>> Konrad 'ktoso’ Malawski
>> Akka  @ Lightbend 
>> 
>>
>> On 30 March 2016 at 08:42:57, Chelios (chelios@gmail.com) wrote:
>>
>> Hey guys 
>>
>> I've got an Eventsource based application (Not CQRS - Read and write are 
>> both on the write side). The state of all the entities/aggregates/actor are 
>> stored in memory because the data is not going to go above 120GB and I've 
>> have a machine with 265GB RAM.
>>
>> *Problem:*
>> Suppose I have a million Products where each *Product* is an Actor 
>> supervised by *ProductSupervisorActor* and I want to perform the 
>> following query:
>> *Query*: Find all the products where the *product description* matches 
>> some user input.
>>
>> I'm wondering if I could get away with just querying the state of the 
>> million actors and aggregating the result into one 
>> *SearchRequestHandlerActor* instead of using a search database like SOLR 
>> ? I've used SOLR before and it's super fast but I'm just trying to reduce 
>> the complexity in my application. If the state is already in memory may be 
>> I can just find a way to query it instead of introducing another moving 
>> part (SOLR) into the system that I have to manage and make sure that the 
>> data is synchronized.
>>
>> I would really like to find a solution to perform the above query 
>> efficiently by just using Actors with paging. If I can achieve this then I 
>> can have *ProductActor*s running anywhere in a cluster and the search 
>> would work just fine. Instead, if I was using SOLR I would have to shard or 
>> partition the database which just another hassle.
>>
>> RIght now I've got a *ProductSearchRequestHandlerActor* which*,* on 
>> initilization, accepts *totalNumberOfMessagesExpect: Long* and accepts 
>> messages of type *Option[ProductState]* until the 
>> *totalNumberOfMessagesExpect* is reached. *I have not implemented paging 
>> yet.* 
>>
>> I just wanted to get your opinion or ideas on how I can achieve this 
>> efficiently or any tips or I'm being silly for trying this because there is 
>> no central index ? 
>>
>> Chel
>> --
>> >> Read the docs: http://akka.io/docs/
>> >> Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >> Search the archives: https://groups.google.com/group/akka-user
>> ---
>> You received this message because you are subscribed to the Google Groups 
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to akka-user+...@googlegroups.com.
>> To post to 

Re: [akka-user] Can I use in-memory Actor state as a search engine ?

2016-03-30 Thread Chelios
Hi Konrad,

Your reply gave me the confidence to continue with implementing the Actor 
based search. Thank You :D ... I'm doing this just for research purposes, 
I'm just trying to see if I can get a high performant, distributed, 
in-memory system by just using Eventsourcing with Akka Actors without using 
any other external database or tool, other than an Eventstore database.  

Can I also attend the workshop ? Seeing something Actor design patterns to 
designing a search engine architecture is something I need to learn for 
this :)

Cheers,
Chel

On Wednesday, March 30, 2016 at 7:29:33 PM UTC+11, Konrad Malawski wrote:
>
> Technically it's doable , but I'm not sure if that'll reduce complexity :-)
> Search really has to be "good" in order to be useful, just "fast but bad 
> results" often won't satisfy anyone,
> thus I'm not sure implementing your own custom search engine is a good 
> idea (unless that is exactly the goal of your business 
> – be a search engine).
>
> A fun fact, one of the workshops I do is basically that, a multi-tier 
> search engine architecture, however it depends if your entire job is to 
> build the search engine, or you just should use an out of the box one 
> because it's one of the 100 things you work on :-)
>
> -- 
> Cheers,
> Konrad 'ktoso’ Malawski
> Akka  @ Lightbend 
> 
>
> On 30 March 2016 at 08:42:57, Chelios (chelios@gmail.com ) 
> wrote:
>
> Hey guys 
>
> I've got an Eventsource based application (Not CQRS - Read and write are 
> both on the write side). The state of all the entities/aggregates/actor are 
> stored in memory because the data is not going to go above 120GB and I've 
> have a machine with 265GB RAM.
>
> *Problem:*
> Suppose I have a million Products where each *Product* is an Actor 
> supervised by *ProductSupervisorActor* and I want to perform the 
> following query:
> *Query*: Find all the products where the *product description* matches 
> some user input.
>
> I'm wondering if I could get away with just querying the state of the 
> million actors and aggregating the result into one 
> *SearchRequestHandlerActor* instead of using a search database like SOLR 
> ? I've used SOLR before and it's super fast but I'm just trying to reduce 
> the complexity in my application. If the state is already in memory may be 
> I can just find a way to query it instead of introducing another moving 
> part (SOLR) into the system that I have to manage and make sure that the 
> data is synchronized.
>
> I would really like to find a solution to perform the above query 
> efficiently by just using Actors with paging. If I can achieve this then I 
> can have *ProductActor*s running anywhere in a cluster and the search 
> would work just fine. Instead, if I was using SOLR I would have to shard or 
> partition the database which just another hassle.
>
> RIght now I've got a *ProductSearchRequestHandlerActor* which*,* on 
> initilization, accepts *totalNumberOfMessagesExpect: Long* and accepts 
> messages of type *Option[ProductState]* until the 
> *totalNumberOfMessagesExpect* is reached. *I have not implemented paging 
> yet.* 
>
> I just wanted to get your opinion or ideas on how I can achieve this 
> efficiently or any tips or I'm being silly for trying this because there is 
> no central index ? 
>
> Chel
> --
> >> Read the docs: http://akka.io/docs/
> >> Check the FAQ: 
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups 
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to akka-user+...@googlegroups.com .
> To post to this group, send email to akka...@googlegroups.com 
> .
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Can I use in-memory Actor state as a search engine ?

2016-03-30 Thread Konrad Malawski
Technically it's doable , but I'm not sure if that'll reduce complexity :-)
Search really has to be "good" in order to be useful, just "fast but bad 
results" often won't satisfy anyone,
thus I'm not sure implementing your own custom search engine is a good idea 
(unless that is exactly the goal of your business 
– be a search engine).

A fun fact, one of the workshops I do is basically that, a multi-tier search 
engine architecture, however it depends if your entire job is to build the 
search engine, or you just should use an out of the box one because it's one of 
the 100 things you work on :-)

-- 
Cheers,
Konrad 'ktoso’ Malawski
Akka @ Lightbend

On 30 March 2016 at 08:42:57, Chelios (chelios.banda...@gmail.com) wrote:

Hey guys

I've got an Eventsource based application (Not CQRS - Read and write are both 
on the write side). The state of all the entities/aggregates/actor are stored 
in memory because the data is not going to go above 120GB and I've have a 
machine with 265GB RAM.

Problem:
Suppose I have a million Products where each Product is an Actor supervised by 
ProductSupervisorActor and I want to perform the following query:
Query: Find all the products where the product description matches some user 
input.

I'm wondering if I could get away with just querying the state of the million 
actors and aggregating the result into one SearchRequestHandlerActor instead of 
using a search database like SOLR ? I've used SOLR before and it's super fast 
but I'm just trying to reduce the complexity in my application. If the state is 
already in memory may be I can just find a way to query it instead of 
introducing another moving part (SOLR) into the system that I have to manage 
and make sure that the data is synchronized.

I would really like to find a solution to perform the above query efficiently 
by just using Actors with paging. If I can achieve this then I can have 
ProductActors running anywhere in a cluster and the search would work just 
fine. Instead, if I was using SOLR I would have to shard or partition the 
database which just another hassle.

RIght now I've got a ProductSearchRequestHandlerActor which, on initilization, 
accepts totalNumberOfMessagesExpect: Long and accepts messages of type 
Option[ProductState] until the totalNumberOfMessagesExpect is reached. I have 
not implemented paging yet. 

I just wanted to get your opinion or ideas on how I can achieve this 
efficiently or any tips or I'm being silly for trying this because there is no 
central index ? 

Chel
--
>> Read the docs: http://akka.io/docs/
>> Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Can I use in-memory Actor state as a search engine ?

2016-03-30 Thread Chelios
Hey guys

I've got an Eventsource based application (Not CQRS - Read and write are 
both on the write side). The state of all the entities/aggregates/actor are 
stored in memory because the data is not going to go above 120GB and I've 
have a machine with 265GB RAM.

*Problem:*
Suppose I have a million Products where each *Product* is an Actor 
supervised by *ProductSupervisorActor* and I want to perform the following 
query:
*Query*: Find all the products where the *product description* matches some 
user input.

I'm wondering if I could get away with just querying the state of the 
million actors and aggregating the result into one 
*SearchRequestHandlerActor* instead of using a search database like SOLR ? 
I've used SOLR before and it's super fast but I'm just trying to reduce the 
complexity in my application. If the state is already in memory may be I 
can just find a way to query it instead of introducing another moving part 
(SOLR) into the system that I have to manage and make sure that the data is 
synchronized.

I would really like to find a solution to perform the above query 
efficiently by just using Actors with paging. If I can achieve this then I 
can have *ProductActor*s running anywhere in a cluster and the search would 
work just fine. Instead, if I was using SOLR I would have to shard or 
partition the database which just another hassle.

RIght now I've got a *ProductSearchRequestHandlerActor *which*, *on 
initilization, accepts *totalNumberOfMessagesExpect: Long* and accepts 
messages of type *Option[ProductState]* until the 
*totalNumberOfMessagesExpect* is reached. *I have not implemented paging 
yet.* 

I just wanted to get your opinion or ideas on how I can achieve this 
efficiently or any tips or I'm being silly for trying this because there is 
no central index ? 

Chel

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.