date:20160519

Re: Question regarding the SQL interface

2016-05-19 Thread Joel Bernstein

I just reviewed the testPredicate method in the test cases:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/handler/TestSQLHandler.java

All the test cases in testPredicate() are formatted like regular SQL. I
don't think the way things are designed you could make a valid query that
combined fields like a typical search.

You will have to make separate calls for each aggregation. To get faceting
performance you would use the facet aggregationMode.

The SQL predicate gets rewritten to a valid Solr query, and then gets
handled by the QueryComponent, like a regular query. So any field
definitions should work fine. But scoring is only performed for queries
with a LIMIT clause.

With the cardinality issue you'll need to experiment a little to see where
the facet mode starts to slow down and lose accuracy. In the future we'll
be moving to streaming facets so cardinality won't be an issue even in
facet mode. So in future releases MapReduce will only be used to handle
distributed joins.

In facet mode it uses the JSON facet API. It scales reasonable well, but I
don't believe it provides fully accurate counts because it doesn't do the
refinement step. But in my testing I didn't push it far enough to where it
fell over. But it eventually will fall over because it's keeping all the
aggregation buckets in memory at once. MapReduce mode is always accurate no
matter how the high cardinality gets.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 4:05 PM, Vachon, Jean-Sébastien <
jvac...@cebglobal.com> wrote:

> Hi all,
>
> I am planning into migrating our application from SolrJ to the SQL
> interface and I have some questions regarding some of Solr features…
>
>
>   *   How can we specify multiple search fields on a keyword. Do we have
> to handle everything by ourselves like in regular SQL?
>
> SELECT x,y,z FROM collection1 WHERE title=‘abc” OR description=‘abc’
>
> Is there a special syntax to allow to search into multiple fields at once?
>
>
>   *   Do you have to generate separate requests to get faceting
> information? Would translating the following query into its SQL equivalent
> require 3 queries?
>
> /select?q=title:abc=true=xyz=def
>
>
>   *   If our schema contains a fieldType using a custom similarity class…
> will the SQL interface honour that mapping?
>
>   *   The documentation about Streaming Expressions and SQL interface are
> referring to terms like “high cardinality” and “very high cardinality”.
> What do they exactly mean? Are we talking about hundreds, thousands or
> millions of different values? Does this depend on other aspect of the
> collection like the size of the documents?
>
> Thanks for your input and guidance
>
>
>
> CEB Canada Inc. Registration No: 1781071. Registered office: 199 Bay
> Street Commerce Court West, # 2800, Toronto, Ontario, Canada, M5L 1AP.
>
>
>
> This e-mail and/or its attachments are intended only for the use of the
> addressee(s) and may contain confidential and legally privileged
> information belonging to CEB and/or its subsidiaries, including SHL. If you
> have received this e-mail in error, please notify the sender and
> immediately, destroy all copies of this email and its attachments. The
> publication, copying, in whole or in part, or use or dissemination in any
> other way of this e-mail and attachments by anyone other than the intended
> person(s) is prohibited.
>
>
>

Re: Stemming nouns ending in 'y'

2016-05-19 Thread Erick Erickson

Mark:

Just a sanity check, was the indexing porter stemmer defined when you
indexed your _first_ document? The admin/analysis page will tell you
what the term is stemmed to at both query and index time.

I'm puzzled by this statement:

bq:  As example, the term 'osteopathy' stemmed with the Porter Stemmer
Filter stems to 'osteopathi', which will match 'osteopath' and
'osteopathic'

Why do you think this will match? the stemmer wouldn't stem the
'osteopath' to the term in then index, namely 'osteopathi' and thus
wouldn't match. Or at least shouldn't So I'm probably missing
something here...

Best,
Erick

On Thu, May 19, 2016 at 12:31 PM, Markus Jelsma
 wrote:
> Hello - try the KStem filter. It is better suited for english and doesn't 
> show this behaviour.
> Markus
>
>
>
> -Original message-
>> From:Mark Vega 
>> Sent: Thursday 19th May 2016 19:55
>> To: solr-user@lucene.apache.org
>> Subject: Stemming nouns ending in 'y'
>>
>> I am using Apache Nutch v1.10 and SOLR v.5.2.1 to index and search a medical 
>> website and am trying to find out why every stemmer I've tried on certain 
>> nouns in medical terminology ending in 'y' merely replaces the ending 'y' 
>> with an 'I'.  As example, the term 'osteopathy' stemmed with the Porter 
>> Stemmer Filter stems to 'osteopathi', which will match 'osteopath' and 
>> 'osteopathic', but will not match the original term 'osteopathy' itself.  
>> I've seen this with quite a few medical and science nouns ending in 'y'  
>> (though, oddly enough, the word 'terminology' itself stems to 'terminolog' 
>> just as I would expect it to) and am wondering whether there is a different 
>> stemmer I should be using, or if I am just using this one incorrectly.  I am 
>> currently applying the PorterStemFilterFactory to a field of type 'text' in 
>> both the indexing and querying analyzers.  Any comments, suggestions or 
>> explanations would be much appreciated.
>>
>> --
>> Mark F. Vega
>> Programmer/Analyst
>> UC Irvine Libraries - Web Services
>> veg...@uci.edu
>> 949.824.9872
>> --
>>
>>

calculate average memory per document

2016-05-19 Thread vitaly bulgakov

Hi, I have solr 4.2
I am wondering if it is possible to compute an average memory per document
in my index.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/calculate-average-memory-per-document-tp4277865.html
Sent from the Solr - User mailing list archive at Nabble.com.

Fw: Select distinct multiple fields

2016-05-19 Thread thiaga rajan

Thanks Joel for the response. In our requirement there is some logic that needs
to be implemented after fetching the results from solr which might have an
impact in working out the pagination

ie, we have data structures like(nested structure is flattened) we need to have
this kind of structure as below we might need to support some other use cases.
Hierarchical structure will not support other use cases. So we have flatten our
data structure and we need to achieve the search in the flat structure below

| Level1 | Level2 | Level3 |
| 1 | 11 | 111 |
| 1 | 11 | 112 |
| 1 | 11 | 113 |
| 1 | 11 | 114 |

Example - When the customer enters 11 we might need to query this word from
the entire data structure.
so we will get all the records including Level3 as well. But ideally we need to
select only 1,11(filtering the current level and parent level). Also another
problem is pagination. We might select 10 recs from example after filtering the
levels/parent matching with the search keyword, the number of records might get
reduced. So we might need to send another request to solr to get the next set
and again working out the level and its parent which matches with the search
keyword till we reach the required row count.

Rather than doing this, is there a way(kind of any plugin like SearchComponent)
will help with the above scenaio or best way to achieve this in solr?Kindly
provide your valuable suggestions on this

On Thursday, 19 May 2016 6:11 PM, Joel Bernstein
wrote:

https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface?focusedCommentId=62697742#ParallelSQLInterface-SELECTDISTINCTQueries

Joel Bernsteinhttp://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 1:10 PM, Joel Bernstein wrote:

The SQL interface and Streaming Expressions support selecting multiple distinct
fields.
The SQL interface can use the JSON facet API or MapReduce to provide the
results.
You the facet function and unique function are the Streaming Expressions that
the SQL interface calls.
Joel Bernsteinhttp://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 12:41 PM, thiaga rajan
wrote:

Hi Team - I have seen select distinct multiple fields is not possible in Solr
and i have seen suggestions coming up on faceting and grouping. I have some
questions. Is there any with any kind of plugins/custom implementation we can
achieve the same
1. Using any plugin or through custom implementation whether we will be able to
achieve the select distinct fields apart from facet and group by...Because the
pagination is kind of issue.
For example - We are setting a pagination of 10. If we are getting 10 records
(along with the duplicates) then we might ending up a getting the results less
than 10.
Any suggestions on this?

45 matches

Mail list logo