Performance Issue in Streaming Expressions

2017-06-01 Thread thiaga rajan
We are working on a proposal and feeling streaming API along with export 
handler will best fit for our usecases. We are already of having a structure in 
solr in which we are using graph queries to produce hierarchical structure. Now 
from the structure we need to join couple of more collections.         We have 
5 different collections.                           Collection 1- 800 k records. 
                                  Collection 2- 200k records.                   
                Collection 3 - 7k records.                                      
 Collection 4 - 6 million records.                             Collection 5 - 
150 k records                               we are using the below strategy     
                        innerJoin( intersect( innerJoin(collection 1,collection 
2), innerJoin(Collection 3, Collection 4)), collection 5).                      
              We are seeing performance is too slow when we start having 
collection 4. Just with collection 1 2 5 the results are coming in 2 secs. The 
moment I have included collection 4 in the query I could see  a performance 
impact. I believe exporting large results from collection 4 is causing the 
issie. Currently I am using single sharded collection with no replica. I 
thinking if we can increase the memory as first option to increase performance 
as processing doc values need more memory. Then if that did not worked I can 
check using parallel stream/ sharding. Kindly advise is there could be anything 
else I  missing?
Sent from Yahoo Mail on Android

Joins using graph queries - solr 6.0

2017-05-21 Thread thiaga rajan
Hi - We are having a tree based structure in solr and we are using solr graph 
queries to perform some search in our usecases. Example - {! graph from=xx 
to=yy} we got a new requirement like we need to search last level nodes based 
on some parents in the tree and those last level nodes needs to be used as a 
filter again to another query(different core/collection). So the tree was part 
of one core where we need to find the last level based on the parent nodes and 
these child nodes should be applied as a filter to another core. Rather than 
doing a two step approach like fetching the last child first and apply to the 
second query. Can we do in a single query by joining these two core/collection?

Sent from Yahoo Mail on Android

Traversal Filter in Graphical Query parser

2016-06-13 Thread thiaga rajan
Hi - I am using graphical query parser that was introduced in Solr 6. On the 
traversal filter, i have provided conditions on one of the field and it is 
working fine. But i was not able to provide conditions on multiple fields.
Please find the below query. 
Working - {!graph from=HIERARCHY_LEVEL_PARENT_KEY to=HIERARCHY_LEVEL_KEY  
traversalFilter=HIERARCHY_ID:201}(HIERARCHY_ID:201 AND 
(HIERARCHY_LEVEL_KEY:451 OR  HIERARCHY_LEVEL_KEY:59734))
Not working{!graph from=HIERARCHY_LEVEL_PARENT_KEY to=HIERARCHY_LEVEL_KEY  
traversalFilter=HIERARCHY_ID:201 OR 
DWH_COLUMN_NAME:P1_NO}(HIERARCHY_ID:201 AND (HIERARCHY_LEVEL_KEY:451 OR  
HIERARCHY_LEVEL_KEY:59734))
Am i missing something here?

Fw: Select distinct multiple fields

2016-05-19 Thread thiaga rajan


Thanks Joel for the response. In our requirement there is some logic that needs 
to be implemented after fetching the results from solr which might have an 
impact in working out the pagination

ie, we have data structures like(nested structure is flattened) we need to have 
this kind of structure as below we might need to support some other use cases. 
Hierarchical structure will not support other use cases. So we have flatten our 
data structure and we need to achieve the search in the flat structure below

| Level1 | Level2 | Level3 |
| 1 | 11 | 111 |
| 1 | 11 | 112 |
| 1 | 11 | 113 |
| 1 | 11 | 114 |

 Example - When the customer enters 11 we might need to query this word from 
the entire data structure. 
so we will get all the records including Level3 as well. But ideally we need to 
select only 1,11(filtering the current level and parent level). Also another 
problem is pagination. We might select 10 recs from example after filtering the 
levels/parent matching with the search keyword, the number of records might get 
reduced. So we might need to send another request to solr to get the next set 
and again working out the level and its parent which matches with the search 
keyword till we reach the required row count. 

Rather than doing this, is there a way(kind of any plugin like SearchComponent) 
will help with the above scenaio or best way to achieve this in solr?Kindly 
provide your valuable suggestions on this 



   On Thursday, 19 May 2016 6:11 PM, Joel Bernstein <joels...@gmail.com> 
wrote:
 

 
https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface?focusedCommentId=62697742#ParallelSQLInterface-SELECTDISTINCTQueries

Joel Bernsteinhttp://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 1:10 PM, Joel Bernstein <joels...@gmail.com> wrote:

The SQL interface and Streaming Expressions support selecting multiple distinct 
fields.
The SQL interface can use the JSON facet API or MapReduce to provide the 
results.
You the facet function and unique function are the Streaming Expressions that 
the SQL interface calls.
Joel Bernsteinhttp://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 12:41 PM, thiaga rajan 
<ecethiagu2...@yahoo.co.in.invalid> wrote:

Hi Team - I have seen select distinct multiple fields is not possible in Solr 
and i have seen suggestions coming up on faceting and grouping. I have some 
questions. Is there any with any kind of plugins/custom implementation we can 
achieve the same
1. Using any plugin or through custom implementation whether we will be able to 
achieve the select distinct fields apart from facet and group by...Because the 
pagination is kind of issue.
For example - We are setting a pagination of 10. If we are getting 10 records 
(along with the duplicates) then we might ending up a getting the results less 
than 10. 
Any suggestions on this?





  

Select distinct multiple fields

2016-05-19 Thread thiaga rajan
Hi Team - I have seen select distinct multiple fields is not possible in Solr 
and i have seen suggestions coming up on faceting and grouping. I have some 
questions. Is there any with any kind of plugins/custom implementation we can 
achieve the same
1. Using any plugin or through custom implementation whether we will be able to 
achieve the select distinct fields apart from facet and group by...Because the 
pagination is kind of issue.
For example - We are setting a pagination of 10. If we are getting 10 records 
(along with the duplicates) then we might ending up a getting the results less 
than 10. 
Any suggestions on this?

Hierarchial Support - Solr

2016-05-18 Thread thiaga rajan


Hi Team,
   We are exploring solr for one of our project as a search engine. It was a 
really a great tool around indexing and response time. While we are exploring 
we got the below questions and understandings. Kindly confirm the same. 

We are actually trying to implement the search engine for a hierarchical 
search. (Tree structure). We have flatten our data structure and exported the 
data in to solr as solr is more meant for flat structure in terms of request 
and response  
Example       1       -11---111,112,113       
--12--121       --13-131,132,133
Our data structure will resemble something like the below and exported the same 
in Solr. Now when the customer enters the key for search, we need to search in 
the below structure. Each of the row in the below table corresponds to the each 
of the document in Solr.
We were able to achieve this in the below structure but need to confirm on the 
below items. 
1. Search the Level with the keyword and send only the parent and not the 
children of the node
Example - if the user enters 12, then we need only 1,12 and not the children(ie 
121)
I assume we dont have a choice to achieve with Solr and we need to write a 
custom implementation for this. Correct me if i am wrong.
2. We need to do the distinct selection for each of the document. Example while 
if the search keyword is 11, then i should send 1,11(Kind of select distinct 
from sql Select distinct L1,L2)) I have read various forums on this and looks 
like we have options around facet, grouping. 
But not exactly the same which we are asking for similar to the Sql distinct. 
If we are not able to get the distinct results, how we will apply the 
pagination. Example - if the page size is 1 to 10, if we have 5 duplicate 
documents, then if we take a distinct of those, it might fail as still we have 
space for another 5 records.  We understand faceting is arranging the results 
based on a faceted field rather than giving the results in the requested way.
Please confirm of our understanding on the above options. Thanks
Document structure below



| L1 | L2 | L3 |
| 1 | 11 | 111 |
| 1 | 11 | 112 |
| 1 | 11 | 113 |
| 1 | 12 | 121 |
| 1 | 13 | 131 |
| 1 | 13 | 132 |
| 1 | 13 | 133 |