Re: Question regarding the SQL interface

2016-05-19 Thread Joel Bernstein
I just reviewed the testPredicate method in the test cases:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/handler/TestSQLHandler.java

All the test cases in testPredicate() are formatted like regular SQL. I
don't think the way things are designed you could make a valid query that
combined fields like a typical search.

You will have to make separate calls for each aggregation. To get faceting
performance you would use the facet aggregationMode.

The SQL predicate gets rewritten to a valid Solr query, and then gets
handled by the QueryComponent, like a regular query. So any field
definitions should work fine. But scoring is only performed for queries
with a LIMIT clause.

With the cardinality issue you'll need to experiment a little to see where
the facet mode starts to slow down and lose accuracy. In the future we'll
be moving to streaming facets so cardinality won't be an issue even in
facet mode. So in future releases MapReduce will only be used to handle
distributed joins.

In facet mode it uses the JSON facet API. It scales reasonable well, but I
don't believe it provides fully accurate counts because it doesn't do the
refinement step. But in my testing I didn't push it far enough to where it
fell over. But it eventually will fall over because it's keeping all the
aggregation buckets in memory at once. MapReduce mode is always accurate no
matter how the high cardinality gets.





Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 4:05 PM, Vachon, Jean-Sébastien <
jvac...@cebglobal.com> wrote:

> Hi all,
>
> I am planning into migrating our application from SolrJ to the SQL
> interface and I have some questions regarding some of Solr features…
>
>
>   *   How can we specify multiple search fields on a keyword. Do we have
> to handle everything by ourselves like in regular SQL?
>
> SELECT x,y,z FROM collection1 WHERE title=‘abc” OR description=‘abc’
>
> Is there a special syntax to allow to search into multiple fields at once?
>
>
>   *   Do you have to generate separate requests to get faceting
> information? Would translating the following query into its SQL equivalent
> require 3 queries?
>
> /select?q=title:abc=true=xyz=def
>
>
>   *   If our schema contains a fieldType using a custom similarity class…
> will the SQL interface honour that mapping?
>
>   *   The documentation about Streaming Expressions and SQL interface are
> referring to terms like “high cardinality” and “very high cardinality”.
> What do they exactly mean? Are we talking about hundreds, thousands or
> millions of different values? Does this depend on other aspect of the
> collection like the size of the documents?
>
> Thanks for your input and guidance
>
>
>
> CEB Canada Inc. Registration No: 1781071. Registered office: 199 Bay
> Street Commerce Court West, # 2800, Toronto, Ontario, Canada, M5L 1AP.
>
>
>
> This e-mail and/or its attachments are intended only for the use of the
> addressee(s) and may contain confidential and legally privileged
> information belonging to CEB and/or its subsidiaries, including SHL. If you
> have received this e-mail in error, please notify the sender and
> immediately, destroy all copies of this email and its attachments. The
> publication, copying, in whole or in part, or use or dissemination in any
> other way of this e-mail and attachments by anyone other than the intended
> person(s) is prohibited.
>
>
>


Re: Stemming nouns ending in 'y'

2016-05-19 Thread Erick Erickson
Mark:

Just a sanity check, was the indexing porter stemmer defined when you
indexed your _first_ document? The admin/analysis page will tell you
what the term is stemmed to at both query and index time.

I'm puzzled by this statement:

bq:  As example, the term 'osteopathy' stemmed with the Porter Stemmer
Filter stems to 'osteopathi', which will match 'osteopath' and
'osteopathic'

Why do you think this will match? the stemmer wouldn't stem the
'osteopath' to the term in then index, namely 'osteopathi' and thus
wouldn't match. Or at least shouldn't So I'm probably missing
something here...

Best,
Erick

On Thu, May 19, 2016 at 12:31 PM, Markus Jelsma
 wrote:
> Hello - try the KStem filter. It is better suited for english and doesn't 
> show this behaviour.
> Markus
>
>
>
> -Original message-
>> From:Mark Vega 
>> Sent: Thursday 19th May 2016 19:55
>> To: solr-user@lucene.apache.org
>> Subject: Stemming nouns ending in 'y'
>>
>> I am using Apache Nutch v1.10 and SOLR v.5.2.1 to index and search a medical 
>> website and am trying to find out why every stemmer I've tried on certain 
>> nouns in medical terminology ending in 'y' merely replaces the ending 'y' 
>> with an 'I'.  As example, the term 'osteopathy' stemmed with the Porter 
>> Stemmer Filter stems to 'osteopathi', which will match 'osteopath' and 
>> 'osteopathic', but will not match the original term 'osteopathy' itself.  
>> I've seen this with quite a few medical and science nouns ending in 'y'  
>> (though, oddly enough, the word 'terminology' itself stems to 'terminolog' 
>> just as I would expect it to) and am wondering whether there is a different 
>> stemmer I should be using, or if I am just using this one incorrectly.  I am 
>> currently applying the PorterStemFilterFactory to a field of type 'text' in 
>> both the indexing and querying analyzers.  Any comments, suggestions or 
>> explanations would be much appreciated.
>>
>> --
>> Mark F. Vega
>> Programmer/Analyst
>> UC Irvine Libraries - Web Services
>> veg...@uci.edu
>> 949.824.9872
>> --
>>
>>


calculate average memory per document

2016-05-19 Thread vitaly bulgakov
Hi, I have solr 4.2
I am wondering if it is possible to compute an average memory per document
in my index.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/calculate-average-memory-per-document-tp4277865.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fw: Select distinct multiple fields

2016-05-19 Thread thiaga rajan


Thanks Joel for the response. In our requirement there is some logic that needs 
to be implemented after fetching the results from solr which might have an 
impact in working out the pagination

ie, we have data structures like(nested structure is flattened) we need to have 
this kind of structure as below we might need to support some other use cases. 
Hierarchical structure will not support other use cases. So we have flatten our 
data structure and we need to achieve the search in the flat structure below

| Level1 | Level2 | Level3 |
| 1 | 11 | 111 |
| 1 | 11 | 112 |
| 1 | 11 | 113 |
| 1 | 11 | 114 |

 Example - When the customer enters 11 we might need to query this word from 
the entire data structure. 
so we will get all the records including Level3 as well. But ideally we need to 
select only 1,11(filtering the current level and parent level). Also another 
problem is pagination. We might select 10 recs from example after filtering the 
levels/parent matching with the search keyword, the number of records might get 
reduced. So we might need to send another request to solr to get the next set 
and again working out the level and its parent which matches with the search 
keyword till we reach the required row count. 

Rather than doing this, is there a way(kind of any plugin like SearchComponent) 
will help with the above scenaio or best way to achieve this in solr?Kindly 
provide your valuable suggestions on this 



   On Thursday, 19 May 2016 6:11 PM, Joel Bernstein  
wrote:
 

 
https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface?focusedCommentId=62697742#ParallelSQLInterface-SELECTDISTINCTQueries

Joel Bernsteinhttp://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 1:10 PM, Joel Bernstein  wrote:

The SQL interface and Streaming Expressions support selecting multiple distinct 
fields.
The SQL interface can use the JSON facet API or MapReduce to provide the 
results.
You the facet function and unique function are the Streaming Expressions that 
the SQL interface calls.
Joel Bernsteinhttp://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 12:41 PM, thiaga rajan 
 wrote:

Hi Team - I have seen select distinct multiple fields is not possible in Solr 
and i have seen suggestions coming up on faceting and grouping. I have some 
questions. Is there any with any kind of plugins/custom implementation we can 
achieve the same
1. Using any plugin or through custom implementation whether we will be able to 
achieve the select distinct fields apart from facet and group by...Because the 
pagination is kind of issue.
For example - We are setting a pagination of 10. If we are getting 10 records 
(along with the duplicates) then we might ending up a getting the results less 
than 10. 
Any suggestions on this?





  

Question regarding the SQL interface

2016-05-19 Thread Vachon , Jean-Sébastien
Hi all,

I am planning into migrating our application from SolrJ to the SQL interface 
and I have some questions regarding some of Solr features…


  *   How can we specify multiple search fields on a keyword. Do we have to 
handle everything by ourselves like in regular SQL?

SELECT x,y,z FROM collection1 WHERE title=‘abc” OR description=‘abc’

Is there a special syntax to allow to search into multiple fields at once?


  *   Do you have to generate separate requests to get faceting information? 
Would translating the following query into its SQL equivalent require 3 queries?

/select?q=title:abc=true=xyz=def


  *   If our schema contains a fieldType using a custom similarity class… will 
the SQL interface honour that mapping?

  *   The documentation about Streaming Expressions and SQL interface are 
referring to terms like “high cardinality” and “very high cardinality”. What do 
they exactly mean? Are we talking about hundreds, thousands or millions of 
different values? Does this depend on other aspect of the collection like the 
size of the documents?

Thanks for your input and guidance



CEB Canada Inc. Registration No: 1781071. Registered office: 199 Bay Street 
Commerce Court West, # 2800, Toronto, Ontario, Canada, M5L 1AP.



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.




RE: Stemming nouns ending in 'y'

2016-05-19 Thread Markus Jelsma
Hello - try the KStem filter. It is better suited for english and doesn't show 
this behaviour.
Markus

 
 
-Original message-
> From:Mark Vega 
> Sent: Thursday 19th May 2016 19:55
> To: solr-user@lucene.apache.org
> Subject: Stemming nouns ending in 'y'
> 
> I am using Apache Nutch v1.10 and SOLR v.5.2.1 to index and search a medical 
> website and am trying to find out why every stemmer I've tried on certain 
> nouns in medical terminology ending in 'y' merely replaces the ending 'y' 
> with an 'I'.  As example, the term 'osteopathy' stemmed with the Porter 
> Stemmer Filter stems to 'osteopathi', which will match 'osteopath' and 
> 'osteopathic', but will not match the original term 'osteopathy' itself.  
> I've seen this with quite a few medical and science nouns ending in 'y'  
> (though, oddly enough, the word 'terminology' itself stems to 'terminolog' 
> just as I would expect it to) and am wondering whether there is a different 
> stemmer I should be using, or if I am just using this one incorrectly.  I am 
> currently applying the PorterStemFilterFactory to a field of type 'text' in 
> both the indexing and querying analyzers.  Any comments, suggestions or 
> explanations would be much appreciated.
> 
> --
> Mark F. Vega
> Programmer/Analyst
> UC Irvine Libraries - Web Services
> veg...@uci.edu
> 949.824.9872
> --
> 
> 


Re: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Shawn Heisey
On 5/19/2016 5:26 AM, Dmitry Kan wrote:
> On query side, right above SOF there is SynonymFilter (SF is not present on
> indexing). It does the following:
> organization -> organization, organisation
>
> SOF turns this pair into: organiz, organis.

Can you put the field and fieldType definitions, plus all files
referenced in those definitions (like the stemdict.txt file), someplace
on the Internet we can reach, and give us URL(s) to reach it?  You could
use gist, http://apaste.info, or similar.  Email attachments often don't
work on the mailing list, so I don't recommend using them.

If you put an expiration date on whatever you use, make it at least one
month out.

I see that you mentioned this on IRC as well, EARLY in the morning for
me.  I will be sporadically checking there.

Thanks,
Shawn



Re: Inconsistent Solr document count on Target clouds when replicating data in Solr6 CDCR

2016-05-19 Thread Renaud Delbru

Hi Dmitry,

You can activate debug log and see more information, such as the number 
of documents replicated by the cdcr replicator thread, etc.


However, I think that the issue is that indexes on the target instances 
are not refreshed, and therefore some of the documents indexed not yet 
visible. Cdcr does not replicate commit operations, and let the target 
cluster handle the refresh. You can try to manually execute a commit 
operation on the target cluster and see if all the documents appears.


Kind Regards
--
Renaud Delbru

On 19/05/16 17:39, dmitry.medve...@barclays.com wrote:

I've come across a weird problem which I'm trying to debug at the moment, and 
was just wondering if anyone has stumbled across it too:

I have an active-passive-passive configuration (1 Source cloud, 2 targets), and 
NOT all the documents are being replicated to the target clouds. Example: 3 
docs are being pushed/indexed on the Source cloud, S1, S2, S3, and only 2 docs 
can be found (almost immediately) on the Target clouds, say T1, T3. The 
behavior is NOT consistent.

I feel like it's a configuration issue, but it could also be a bug. How can I 
debug this issue?

What log files should I examine?

I couldn't find anything in the logs (of both the Source & Target clouds).



Source configuration:



10.88.52.219:9983,10.36.75.4:9983
demo
demo



2
10
128



1000





${solr.ulog.dir:}





Target(s) configuration:



disabled










cdcr-proc-chain





${solr.ulog.dir:}




Thnx,
Dmitry Medvedev
Tech search leader
BARCLAYS CAPITAL
Search Platform Engineering
Global Technology Infrastructure Services  (GTIS)
Barclays Capital, Atidim High-Tech Industrial Park, Tel Aviv 61580
* DDI : +972-3-5452462 * Mobile : +972-545874521
* 
dmitry.medve...@barclayscapital.com

P Please consider the environment before printing this email


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___





Stemming nouns ending in 'y'

2016-05-19 Thread Mark Vega
I am using Apache Nutch v1.10 and SOLR v.5.2.1 to index and search a medical 
website and am trying to find out why every stemmer I've tried on certain nouns 
in medical terminology ending in 'y' merely replaces the ending 'y' with an 
'I'.  As example, the term 'osteopathy' stemmed with the Porter Stemmer Filter 
stems to 'osteopathi', which will match 'osteopath' and 'osteopathic', but will 
not match the original term 'osteopathy' itself.  I've seen this with quite a 
few medical and science nouns ending in 'y'  (though, oddly enough, the word 
'terminology' itself stems to 'terminolog' just as I would expect it to) and am 
wondering whether there is a different stemmer I should be using, or if I am 
just using this one incorrectly.  I am currently applying the 
PorterStemFilterFactory to a field of type 'text' in both the indexing and 
querying analyzers.  Any comments, suggestions or explanations would be much 
appreciated.

--
Mark F. Vega
Programmer/Analyst
UC Irvine Libraries - Web Services
veg...@uci.edu
949.824.9872
--



Re: Select distinct multiple fields

2016-05-19 Thread Joel Bernstein
https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface?focusedCommentId=62697742#ParallelSQLInterface-SELECTDISTINCTQueries

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 1:10 PM, Joel Bernstein  wrote:

> The SQL interface and Streaming Expressions support selecting multiple
> distinct fields.
>
> The SQL interface can use the JSON facet API or MapReduce to provide the
> results.
>
> You the facet function and unique function are the Streaming Expressions
> that the SQL interface calls.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, May 19, 2016 at 12:41 PM, thiaga rajan <
> ecethiagu2...@yahoo.co.in.invalid> wrote:
>
>> Hi Team - I have seen select distinct multiple fields is not possible in
>> Solr and i have seen suggestions coming up on faceting and grouping. I have
>> some questions. Is there any with any kind of plugins/custom implementation
>> we can achieve the same
>> 1. Using any plugin or through custom implementation whether we will be
>> able to achieve the select distinct fields apart from facet and group
>> by...Because the pagination is kind of issue.
>> For example - We are setting a pagination of 10. If we are getting 10
>> records (along with the duplicates) then we might ending up a getting the
>> results less than 10.
>> Any suggestions on this?
>
>
>


Re: Select distinct multiple fields

2016-05-19 Thread Joel Bernstein
The SQL interface and Streaming Expressions support selecting multiple
distinct fields.

The SQL interface can use the JSON facet API or MapReduce to provide the
results.

You the facet function and unique function are the Streaming Expressions
that the SQL interface calls.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 12:41 PM, thiaga rajan <
ecethiagu2...@yahoo.co.in.invalid> wrote:

> Hi Team - I have seen select distinct multiple fields is not possible in
> Solr and i have seen suggestions coming up on faceting and grouping. I have
> some questions. Is there any with any kind of plugins/custom implementation
> we can achieve the same
> 1. Using any plugin or through custom implementation whether we will be
> able to achieve the select distinct fields apart from facet and group
> by...Because the pagination is kind of issue.
> For example - We are setting a pagination of 10. If we are getting 10
> records (along with the duplicates) then we might ending up a getting the
> results less than 10.
> Any suggestions on this?


Inconsistent Solr document count on Target clouds when replicating data in Solr6 CDCR

2016-05-19 Thread dmitry.medvedev
I've come across a weird problem which I'm trying to debug at the moment, and 
was just wondering if anyone has stumbled across it too:

I have an active-passive-passive configuration (1 Source cloud, 2 targets), and 
NOT all the documents are being replicated to the target clouds. Example: 3 
docs are being pushed/indexed on the Source cloud, S1, S2, S3, and only 2 docs 
can be found (almost immediately) on the Target clouds, say T1, T3. The 
behavior is NOT consistent.

I feel like it's a configuration issue, but it could also be a bug. How can I 
debug this issue?

What log files should I examine?

I couldn't find anything in the logs (of both the Source & Target clouds).



Source configuration:



10.88.52.219:9983,10.36.75.4:9983
demo
demo



2
10
128



1000





${solr.ulog.dir:}





Target(s) configuration:



disabled










cdcr-proc-chain





${solr.ulog.dir:}




Thnx,
Dmitry Medvedev
Tech search leader
BARCLAYS CAPITAL
Search Platform Engineering
Global Technology Infrastructure Services  (GTIS)
Barclays Capital, Atidim High-Tech Industrial Park, Tel Aviv 61580
* DDI : +972-3-5452462 * Mobile : +972-545874521
* 
dmitry.medve...@barclayscapital.com

P Please consider the environment before printing this email


___

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

___


Select distinct multiple fields

2016-05-19 Thread thiaga rajan
Hi Team - I have seen select distinct multiple fields is not possible in Solr 
and i have seen suggestions coming up on faceting and grouping. I have some 
questions. Is there any with any kind of plugins/custom implementation we can 
achieve the same
1. Using any plugin or through custom implementation whether we will be able to 
achieve the select distinct fields apart from facet and group by...Because the 
pagination is kind of issue.
For example - We are setting a pagination of 10. If we are getting 10 records 
(along with the duplicates) then we might ending up a getting the results less 
than 10. 
Any suggestions on this?

Re: SolrCloud replicas consistently out of sync

2016-05-19 Thread Jeff Wartes
That case related to consistency after a ZK outage or network connectivity 
issue. Your case is standard operation, so I’m not sure that’s really the same 
thing. I’m aware of a few issues that cam happen if ZK connectivity goes wonky, 
that I hope are fixed in SOLR-8697.

This one might be a closer match to your problem though: 
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3CCAOWq+=iePCJjnQiSqxgDVEPv42Pi7RUtw0X0=9f67mpcm99...@mail.gmail.com%3E




On 5/19/16, 9:10 AM, "Aleksey Mezhva"  wrote:

>Bump.
>
>this thread is with someone having a similar issue:
>
>https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201601.mbox/%3c09fdab82-7600-49e0-b639-9cb9db937...@yahoo.com%3E
>
>It seems like this is not really fixed in 5.4/6.0?
>
>
>Aleksey
>
>From: Steve Weiss 
>Date: Tuesday, May 17, 2016 at 7:25 PM
>To: "solr-user@lucene.apache.org" 
>Cc: Aleksey Mezhva , Hans Zhou 
>Subject: Re: SolrCloud replicas consistently out of sync
>
>Gotcha - well that's nice.  Still, we seem to be permanently out of sync.
>
>I see this thread with someone having a similar issue:
>
>https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201601.mbox/%3c09fdab82-7600-49e0-b639-9cb9db937...@yahoo.com%3E
>
>It seems like this is not really fixed in 5.4/6.0?  Is there any version of 
>SolrCloud where this wasn't yet a problem that we could downgrade to?
>
>--
>Steve
>
>On Tue, May 17, 2016 at 6:23 PM, Markus Jelsma 
>> wrote:
>Hi, thats a known issue and unrelated:
>https://issues.apache.org/jira/browse/SOLR-9120
>
>M.
>
>
>-Original message-
>> From:Stephen Weiss >
>> Sent: Tuesday 17th May 2016 23:10
>> To: solr-user@lucene.apache.org; Aleksey 
>> Mezhva >; Hans Zhou 
>> >
>> Subject: Re: SolrCloud replicas consistently out of sync
>>
>> I should add - looking back through the logs, we're seeing frequent errors 
>> like this now:
>>
>> 78819692 WARN  (qtp110456297-1145) [   ] o.a.s.h.a.LukeRequestHandler Error 
>> getting file length for [segments_4o]
>> java.nio.file.NoSuchFileException: 
>> /var/solr/data/instock_shard5_replica1/data/index.20160516230059221/segments_4o
>>
>> --
>> Steve
>>
>>
>> On Tue, May 17, 2016 at 5:07 PM, Stephen Weiss 
>> >>
>>  wrote:
>> OK, so we did as you suggest, read through that article, and we reconfigured 
>> the autocommit to:
>>
>> 
>> ${solr.autoCommit.maxTime:3}
>> false
>> 
>>
>> 
>> ${solr.autoSoftCommit.maxTime:60}
>> 
>>
>> However, we see no change, aside from the fact that it's clearly committing 
>> more frequently.  I will say on our end, we clearly misunderstood the 
>> difference between soft and hard commit, but even now having it configured 
>> this way, we are still totally out of sync, long after all indexing has 
>> completed (it's been about 30 minutes now).  We manually pushed through a 
>> commit on the whole collection as suggested, however, all we get back for 
>> that is o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping 
>> IW.commit., which makes sense, because it was all committed already anyway.
>>
>> We still currently have all shards mismatched:
>>
>> instock_shard1   replica 1: 30788491 replica 2: 30778865
>> instock_shard10   replica 1: 30973059 replica 2: 30971874
>> instock_shard11   replica 2: 31036815 replica 1: 31034715
>> instock_shard12   replica 2: 30177084 replica 1: 30170511
>> instock_shard13   replica 2: 30608225 replica 1: 30603923
>> instock_shard14   replica 2: 30755739 replica 1: 30753191
>> instock_shard15   replica 2: 30891713 replica 1: 30891528
>> instock_shard16   replica 1: 30818567 replica 2: 30817152
>> instock_shard17   replica 1: 30423877 replica 2: 30422742
>> instock_shard18   replica 2: 30874979 replica 1: 30872223
>> instock_shard19   replica 2: 30917208 replica 1: 3090
>> instock_shard2   replica 1: 31062339 replica 2: 31060575
>> instock_shard20   replica 1: 30192046 replica 2: 30190893
>> instock_shard21   replica 2: 30793817 replica 1: 30791135
>> instock_shard22   replica 2: 30821521 replica 1: 30818836
>> instock_shard23   replica 2: 30553773 replica 1: 30547336
>> instock_shard24   replica 1: 30975564 replica 2: 30971170
>> instock_shard25   replica 1: 30734696 replica 2: 30731682
>> instock_shard26   replica 1: 31465696 replica 2: 31464738
>> instock_shard27   replica 1: 30844884 replica 2: 30842445
>> instock_shard28   replica 2: 30549826 replica 1: 30547405
>> instock_shard29   replica 2: 3063 replica 1: 30634091
>> instock_shard3   replica 1: 30930723 replica 2: 30926483
>> 

Re: SolrCloud replicas consistently out of sync

2016-05-19 Thread Aleksey Mezhva
Bump.

this thread is with someone having a similar issue:

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201601.mbox/%3c09fdab82-7600-49e0-b639-9cb9db937...@yahoo.com%3E

It seems like this is not really fixed in 5.4/6.0?


Aleksey

From: Steve Weiss 
Date: Tuesday, May 17, 2016 at 7:25 PM
To: "solr-user@lucene.apache.org" 
Cc: Aleksey Mezhva , Hans Zhou 
Subject: Re: SolrCloud replicas consistently out of sync

Gotcha - well that's nice.  Still, we seem to be permanently out of sync.

I see this thread with someone having a similar issue:

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201601.mbox/%3c09fdab82-7600-49e0-b639-9cb9db937...@yahoo.com%3E

It seems like this is not really fixed in 5.4/6.0?  Is there any version of 
SolrCloud where this wasn't yet a problem that we could downgrade to?

--
Steve

On Tue, May 17, 2016 at 6:23 PM, Markus Jelsma 
> wrote:
Hi, thats a known issue and unrelated:
https://issues.apache.org/jira/browse/SOLR-9120

M.


-Original message-
> From:Stephen Weiss >
> Sent: Tuesday 17th May 2016 23:10
> To: solr-user@lucene.apache.org; Aleksey 
> Mezhva >; Hans Zhou 
> >
> Subject: Re: SolrCloud replicas consistently out of sync
>
> I should add - looking back through the logs, we're seeing frequent errors 
> like this now:
>
> 78819692 WARN  (qtp110456297-1145) [   ] o.a.s.h.a.LukeRequestHandler Error 
> getting file length for [segments_4o]
> java.nio.file.NoSuchFileException: 
> /var/solr/data/instock_shard5_replica1/data/index.20160516230059221/segments_4o
>
> --
> Steve
>
>
> On Tue, May 17, 2016 at 5:07 PM, Stephen Weiss 
> >>
>  wrote:
> OK, so we did as you suggest, read through that article, and we reconfigured 
> the autocommit to:
>
> 
> ${solr.autoCommit.maxTime:3}
> false
> 
>
> 
> ${solr.autoSoftCommit.maxTime:60}
> 
>
> However, we see no change, aside from the fact that it's clearly committing 
> more frequently.  I will say on our end, we clearly misunderstood the 
> difference between soft and hard commit, but even now having it configured 
> this way, we are still totally out of sync, long after all indexing has 
> completed (it's been about 30 minutes now).  We manually pushed through a 
> commit on the whole collection as suggested, however, all we get back for 
> that is o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping 
> IW.commit., which makes sense, because it was all committed already anyway.
>
> We still currently have all shards mismatched:
>
> instock_shard1   replica 1: 30788491 replica 2: 30778865
> instock_shard10   replica 1: 30973059 replica 2: 30971874
> instock_shard11   replica 2: 31036815 replica 1: 31034715
> instock_shard12   replica 2: 30177084 replica 1: 30170511
> instock_shard13   replica 2: 30608225 replica 1: 30603923
> instock_shard14   replica 2: 30755739 replica 1: 30753191
> instock_shard15   replica 2: 30891713 replica 1: 30891528
> instock_shard16   replica 1: 30818567 replica 2: 30817152
> instock_shard17   replica 1: 30423877 replica 2: 30422742
> instock_shard18   replica 2: 30874979 replica 1: 30872223
> instock_shard19   replica 2: 30917208 replica 1: 3090
> instock_shard2   replica 1: 31062339 replica 2: 31060575
> instock_shard20   replica 1: 30192046 replica 2: 30190893
> instock_shard21   replica 2: 30793817 replica 1: 30791135
> instock_shard22   replica 2: 30821521 replica 1: 30818836
> instock_shard23   replica 2: 30553773 replica 1: 30547336
> instock_shard24   replica 1: 30975564 replica 2: 30971170
> instock_shard25   replica 1: 30734696 replica 2: 30731682
> instock_shard26   replica 1: 31465696 replica 2: 31464738
> instock_shard27   replica 1: 30844884 replica 2: 30842445
> instock_shard28   replica 2: 30549826 replica 1: 30547405
> instock_shard29   replica 2: 3063 replica 1: 30634091
> instock_shard3   replica 1: 30930723 replica 2: 30926483
> instock_shard30   replica 2: 30904528 replica 1: 30902649
> instock_shard31   replica 2: 31175813 replica 1: 31174921
> instock_shard32   replica 2: 30932837 replica 1: 30926456
> instock_shard4   replica 2: 30758100 replica 1: 30754129
> instock_shard5   replica 2: 31008893 replica 1: 31002581
> instock_shard6   replica 2: 31008679 replica 1: 31005380
> instock_shard7   replica 2: 30738468 replica 1: 30737795
> instock_shard8   replica 2: 30620929 replica 1: 30616715
> instock_shard9   replica 1: 31071386 replica 2: 31066956
>
> The fact that the min_rf numbers aren't coming back as 2 seems to indicate to 
> me that documents simply aren't making it to both 

Solr join between documents

2016-05-19 Thread elisabeth benoit
Hello all,

I was wondering if there was a solr solution for a problem I have (and I'm
not the only one I guess)

We use solr as a search engine for addresses. We sometimes have requests
with let's say for instance

street A close to street B City postcode

I was wondering if some kind of join between two documents is possible in
solr?

The query would be: find union of two documents matching all words in query.

Those documents have a latitude and a longitude, and we would fix a max
distance between two documents to be eligible for a join.

Is there a way to do this?

Best regards,
Elisabeth


Re: Solrj 4.7.2 - slowing down over time

2016-05-19 Thread Ahmet Arslan
Hi,

EmbeddedSolrServer bypass the servlet container.
Please see : 
http://find.searchhub.org/document/a88f669d38513a76




On Thursday, May 19, 2016 6:23 PM, Roman Slavik  wrote:
Hi Ahmet,
thanks for your response, I appreciate it.

I thought that EmbeddedSolrServer is just wrapper around Solr core
functionality. Solr 4.7.2 is (was?) distributed as war file and I didn't
found any mention about compatibility problem with Tomcat. 
Maybe with jetty it would work slightly faster, but I don't think this
causes problem we have.


Roman



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-4-7-2-slowing-down-over-time-tp4277519p429.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj 4.7.2 - slowing down over time

2016-05-19 Thread Roman Slavik
Hi Ahmet,
thanks for your response, I appreciate it.

I thought that EmbeddedSolrServer is just wrapper around Solr core
functionality. Solr 4.7.2 is (was?) distributed as war file and I didn't
found any mention about compatibility problem with Tomcat. 
Maybe with jetty it would work slightly faster, but I don't think this
causes problem we have.

Roman



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-4-7-2-slowing-down-over-time-tp4277519p429.html
Sent from the Solr - User mailing list archive at Nabble.com.


Get documents having a boolean field:false or not having the field at all

2016-05-19 Thread Sebastian Riemer
Hi,

I've introduced a new boolean field "is_deleted_b_ns" on my objects which I 
index with Solr. I am using dynamic field definitions ("b" indicating Boolean, 
"ns" for "not stored").

Since the field did not exist while the index was built, none of my documents 
currently has that field indexed.

My queries from now on must always include this new boolean field: either they 
ask the index is_deleted_b_ns:false or is_deleted_b_ns:true. However since the 
field is not yet indexed both queries return 0 results.

I see two ways I could go from here:

1)  Either rebuild the whole index so that all documents index this newly 
added field as well (time consuming) and then above queries will return the 
expected results.

2)  I think I could ease my query in the way of OR combining: 
is_deleted_b_ns:false with -is_deleted_b_ns:[* TO *]

That would mean "give me the documents where the flag is false or where it does 
not exist at all"

Doing 1) is ok for now since this is a big change and we're not in production 
yet. Doing 2) feels kind of bad since I don't know if it's a big performance 
hit. Also I don't like it since it seems like I react to the current state of 
the index in my program code - someday the index will be up to date again and 
then I'd have this broader query logic in my program which is not needed 
anymore.

However 1) will be a problem when we are in production someday. Sure, we won't 
have changes that big all time to the index schema but one never knows.

What's your opinion on this? May be there is another option as well?

Best regards,
Sebastian

Mit freundlichen Grüßen
Sebastian Riemer, BSc


[logo_Littera_SC]
LITTERA Software & Consulting GmbH
A-6060 Hall i.T., Haller Au 19a
Telefon: +43(0) 50 765 000, Fax: +43(0) 50 765 118
Sitz: Hall i.T., eingetragen beim Handelsgericht Innsbruck,
Firmenbuch-Nr. FN 295807k, geschäftsführender Gesellschafter: Albert 
Unterkircher

D-80637 München, Landshuter Allee 8-10
Telefon: +49(0) 89 919 29 122, Fax: +49(0) 89 919 29 123
Sitz: München, eingetragen beim Amtsgericht München
unter HRB 103698, Geschäftsführer: Albert Unterkircher
E-Mail: off...@littera.eu
Homepage: www.littera.eu

Diese Nachricht kann vertrauliche, nicht für die Veröffentlichung bestimmte 
und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der 
beabsichtigte Empfänger sind, beachten Sie bitte, dass jegliche 
Veröffentlichung, Verbreitung oder Vervielfältigung dieser Nachricht 
strengstens untersagt ist. Sollten Sie diese Nachricht irrtümlich erhalten 
haben, informieren Sie bitte sofort den Absender durch Anruf oder Rücksendung 
der Nachricht und vernichten Sie diese.

This communication may contain information that is legally privileged, 
confidential or exempt from disclosure.  If you are not the intended recipient, 
please note that any dissemination, distribution, or copying of this 
communication is strictly prohibited.  Anyone who receives this message in 
error should notify the sender immediately by telephone or by return e-mail and 
delete this communication entirely from his or her computer.



Re: Solrj 4.7.2 - slowing down over time

2016-05-19 Thread Roman Slavík
Hi,
thanks for your response!

We use javamelody for some basic statistics about app. Here are some graphs
from last 24 hours:
http://imgur.com/a/OQxnb

First graph is memory used by application. Second graph shows how seach
time rapidly increased.
At 13:40 there was neccessary app restart and at 1:30 index optimize job
started. Everything has been fine today, no restart needed.

Based on these graphs, I don't think problem is tight memory.

Can you explain that point with autoCommit?
Now we create list of SolrInputDocuments, add them to EmbeddedSolrServer
and then call explicit hard commit EmbeddedSolrServer.commit() with
both waitFlush and waitSearcher set to true (could be this that problem?).
So no autoCommit is used.
Main reason for it is that after somebody does some document modifications,
we insert document's ID into database. After some time all these documents
are reindexed, commited and removed from database. So everything is
persistent - modified documents are in database or already commited.


Roman


2016-05-18 23:24 GMT+02:00 Joel Bernstein :

> One thing to investigate is whether your caches are too large and gradually
> filling up memory. It does sound like memory is getting tighter over time.
> A memory profiler would be helpful in figuring out memory issues.
>
> Moving to autoCommits would also eliminate any slowness due to overlapping
> searchers from committing too frequently.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 18, 2016 at 10:39 AM, Roman Slavík  wrote:
>
> > Hi all,
> >
> > we're using solr in our application and have problem that both searching
> > and indexing is slowing down over time.
> >
> > Versions:
> > -
> > Java 1.7
> > Solr 4.7.2
> > Lucene 4.1 (luceneMatchVersion param in solrconfig.xml)
> >
> > App architecture:
> > -
> > We don't use solr as standalone application, but we integrated him with
> > solrj library into our app.
> > It has 1 CoreContainer with 3 cores (let's call them alpha, beta, gamma).
> > EmbeddedSolrServer is used as java class to work with each core.
> > Application does both searching and indexing.
> >
> > What's going on:
> > 
> > After Tomcat restart everything is working great for 2-3 days. But after
> > this time solr (all cores) starts slw down until it's unusable and we
> > have to restart Tomcat. Then it works fine.
> > For example search time for really complex query is 1,5 s when it works
> > fine. But then it rises to more than 1 min. Same issue is with indexing -
> > first fast but then slow.
> >
> > Searching:
> > --
> > Core apha is used mainly for normal search. But sometimes for faceting
> too.
> > Beta and gamma are only for facets.
> > alpha: 25000 queries/day
> > beta: 7000 queries/day
> > gamma: 7000 queries/day
> > We do lots of query joins, sometimes cross cores.
> >
> > Indexing:
> > -
> > We commit changes continuously over day. Number of commits is limited to
> 1
> > commit/min for all three cores. So we do max 1440 commits daily. One
> commit
> > contains between 1 and 100 docs.
> > Method EmbeddedSolrServer.add(SolrInputDocument) is used and in the end
> > EmbeddedSolrServer.commit().
> > Every night we call EmbeddedSolrServer.optimize() on each core.
> >
> > Index size:
> > ---
> > alpha: 13,5 GB
> > beta: 300 MB
> > gamma: 600 MB
> >
> > Hardware:
> > -
> > Ubuntu 14.04
> > 8 core CPU
> > java heap space 22 GB RAM
> > SSD drive with more than 50 GB free space
> >
> > Solr config (Same configuration is used for all cores):
> > ---
> >  > class="${solr.directoryFactory:solr.MMapDirectoryFactory}"/>
> > LUCENE_41
> > 
> > false
> > 10
> >
> > 32
> > 1
> > 1000
> > 1
> >
> > native
> > false
> > true
> >
> > 
> >   1
> >   0
> > 
> > 
> >
> > 
> >
> >  > autowarmCount="0"/>
> >  > autowarmCount="0"/>
> >  > autowarmCount="0"/>
> >
> > true
> > false
> > 2
> >
> >
> > Conclusion:
> > ---
> > Is something wrong in configuration? Or is this some kind of bug? Or...?
> > Can you give me some advice how to resolve this problem please?
> >
> >
> > Roman
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>


Re: Need Help with Solr 6.0 Cross Data Center Replication

2016-05-19 Thread Abdel Belkasri
Hi Renaud,

I was not reloading the collection, but I just did. It didn't help.

The error is always the same:


   - *gettingstarted_shard1_replica2:*
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
   Error Instantiating Update Handler, solr.DirectUpdateHandler2 failed to
   instantiate org.apache.solr.update.UpdateHandler


On Thu, May 19, 2016 at 9:17 AM, Renaud Delbru 
wrote:

> Hi Abdel,
>
> have you reloaded the collection [1] after uploading the configuration to
> zookeeper ?
>
> [1]
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2
>
> --
> Renaud Delbru
>
>
> On 16/05/16 17:29, Abdel Belkasri wrote:
>
>> Thanks Renaud.
>>
>> Here is my setup:
>>
>> 1- I have created 2 sites: Main (source) and DR (traget).
>> 2- Both sites are the same before configuring CDCR
>> 3- The collections (source and target) are created before configuring CDCR
>> 4- collections are created using interactive mode: accepting most defaults
>> except the ports (gettingstarted collection)
>> 5- I have a zookeeper ensemble too.
>> 6- I change the solrconfig.xml, then I upload using the command:
>> # upload configset to zookeeper
>> zkcli.bat -cmd upconfig -zkhost  localhost:2181 -confname gettingstarted
>> -solrhome C:\solr\solr-6-cloud\solr-6.0.0 -confdir
>> C:\solr\solr-6-cloud\solr-6.0.0\server\solr\configsets\basic_configs\conf
>>
>> Renaud can you send your confi files...
>>
>> Thanks,
>> --Abdel.
>>
>> On Mon, May 16, 2016 at 12:16 PM, Satvinder Singh <
>> satvinder.si...@nc4.com>
>> wrote:
>>
>> Thank you.
>>>
>>> To summarize this is what I have, all VMS running on Centos7 :
>>>
>>> Source Side
>>>  |___ 1 VM running 3 Zookeeper instances on port 2181, 2182 and
>>> 2183 (ZOOKEEPER 3.4.8)(Java 1.8.0_91)
>>>  |___ 1 VM running 2 solr 6.0 instances on port 8501, 8502 (Solr
>>> 6.0) (Java 1.8.0_91)
>>>  |___ sample_techproducts_config copied as 'liferay', and used to
>>> create collections, that is where I am
>>>   modifying the solrconfig.xml
>>>
>>>
>>> Target Side
>>>  |___ 1 VM running 3 Zookeeper instances on port 2181, 2182 and
>>> 2183 (ZOOKEEPER 3.4.8)(Java 1.8.0_91)
>>>  |___ 1 VM running 2 solr 6.0 instances on port 8501, 8502 (Solr
>>> 6.0) (Java 1.8.0_91)
>>>  |___ sample_techproducts_config copied as 'liferay', and used to
>>> create collections, that is where I am
>>>   modifying the solrconfig.xml
>>>
>>>
>>> Thanks
>>> Satvinder Singh
>>> Security Systems Engineer
>>> satvinder.si...@nc4.com
>>> 703.682.6000 x276 direct
>>> 703.989.8030 cell
>>> www.NC4.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -Original Message-
>>> From: Renaud Delbru [mailto:renaud@siren.solutions]
>>> Sent: Monday, May 16, 2016 11:59 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Need Help with Solr 6.0 Cross Data Center Replication
>>>
>>> Thanks Satvinder,
>>> Tomorrow, I'll try to reproduce the issue with your steps and will let
>>> you
>>> know.
>>>
>>> Regards
>>> --
>>> Renaud Delbru
>>>
>>> On 16/05/16 16:53, Satvinder Singh wrote:
>>>
 Hi,

 So the way I am doing it is, for both for the Target and Source side, I

>>> took a copy of the sample_techproducts_config configset, can created one
>>> configset. Then I modified the solrconfig.xml in there, both for the
>>> Target
>>> and Source side. And then created the collection, and I get the errors. I
>>> get the error if I create a new collection or try to reload an existing
>>> collection after the solrconfig update.
>>>
 Attached is the log and configs.
 Thanks

 Satvinder Singh



 Security Systems Engineer
 satvinder.si...@nc4.com
 703.682.6000 x276 direct
 703.989.8030 cell
 www.NC4.com








 -Original Message-
 From: Renaud Delbru [mailto:renaud@siren.solutions]
 Sent: Monday, May 16, 2016 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Need Help with Solr 6.0 Cross Data Center Replication

 Hi,

 I have tried to reproduce the problem, but was unable to.
 I have downloaded the Solr 6.0 distribution, added to the solr config

>>> the cdcr request handler and modified the update handler to register the
>>> CdcrUpdateLog, then start Solr in cloud mode and created a new collection
>>> using my solr config. The cdcr request handler starts properly and does
>>> not
>>> complain about the update log.
>>>

 Could you provide more background on how to reproduce the issue ? E.g.,

>>> how do you create a new collection with the cdcr configuration.
>>>
 Are you trying to configure CDCR on collections that were created prior

>>> to the CDCR configuration ?
>>>

 @Erik: I have noticed a small issue in the CDCR page of the reference

>>> guide. In the code snippet in Configuration -> Source Configuration, the
>>>  element is nested within the .
>>>

Re: Need Help with Solr 6.0 Cross Data Center Replication

2016-05-19 Thread Abdel Belkasri
Thanks Renaud.

For me I did this:

1) I created solr cloud using the interactive mode (accepting most default):
$ bin/solr -e cloud

I do that for both the DataCenter1 (DC1) and DataCenter2 (DC2) (in fact
both are on the same machine, just different ports)
By now both clouds have a collection called gettingstarted, each cloud has
2 nodes, 2 shards

2) Each dc is using an ensemble of 3 zookeepers

3) I used the basic config that comes with the install and I upload
configset to zookeeper using:
zkcli.bat -cmd upconfig -zkhost  localhost:2181 -confname gettingstarted
-solrhome C:\solr\solr-6-cloud\solr-6.0.0 -confdir
C:\solr\solr-6-cloud\solr-6.0.0\server\solr\configsets\basic_configs\conf

4) I tested the sites both works fine

5) Then I shut them all (except the zookeepers)

6) change the solrconfig.xml for DC1 to be source

7) change the solrconfig.xml for DC2 to be destination

8) I upload the confg again (update) to zookepper using command in 3

9) then start the 2 clouds using:
# start node 1
solr.cmd start -cloud -p 8985 -s
"C:\solr\solr-6-cloud\solr-6.0.0-dr\example\cloud\node1\solr" -z
localhost:3181
# start node 2
solr.cmd start -cloud -p 7575 -s
"C:\solr\solr-6-cloud\solr-6.0.0-dr\example\cloud\node2\solr" -z
localhost:3181

10) do the same for DC2

The clouds both throw that error about update log that we started with


A Question at what time you enable CDCR ?

Best Regards,
--Abdel.




On Thu, May 19, 2016 at 7:12 AM, Renaud Delbru 
wrote:

> I have reproduced your steps and the cdcr request handler started
> successfully. I have attached to this mail the config sets I have used. It
> is simply the sample_techproducts_config configset with your solrconfig.xml.
>
> I have used solr 6.0.0 with the following commands:
>
> $ ./bin/solr start -cloud
>
> $ ./bin/solr create_collection -c test_cdcr -d cdcr_configs
>
> Connecting to ZooKeeper at localhost:9983 ...
> Uploading /solr-6.0.0/server/solr/configsets/cdcr_configs/conf for config
> test_cdcr to ZooKeeper at localhost:9983
>
> Creating new collection 'test_cdcr' using command:
>
> http://localhost:8983/solr/admin/collections?action=CREATE=test_cdcr=1=1=1=test_cdcr
>
> {
>   "responseHeader":{
> "status":0,
> "QTime":5765},
>   "success":{"127.0.1.1:8983_solr":{
>   "responseHeader":{
> "status":0,
> "QTime":4426},
>   "core":"test_cdcr_shard1_replica1"}}}
>
> $ curl http://localhost:8983/solr/test_cdcr/cdcr?action=STATUS
>
> 
> 
> 0 name="QTime">3 name="process">stoppedenabled
> 
>
>
>
> The difference is that I have used the embedded zookeeper, not a separate
> ensemble.
>
> Could you please provide the commands you used to create the collection ?
>
> Kind Regards
> --
> Renaud Delbru
>
>
> On 16/05/16 16:55, Satvinder Singh wrote:
>
>> I also am using a zk ensemble with 3 nodes on each side.
>>
>> Thanks
>>
>>
>> Satvinder Singh
>>
>>
>>
>> Security Systems Engineer
>> satvinder.si...@nc4.com
>> 703.682.6000 x276 direct
>> 703.989.8030 cell
>> www.NC4.com
>>
>>
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Satvinder Singh [mailto:satvinder.si...@nc4.com]
>> Sent: Monday, May 16, 2016 11:54 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Need Help with Solr 6.0 Cross Data Center Replication
>>
>> Hi,
>>
>> So the way I am doing it is, for both for the Target and Source side, I
>> took a copy of the sample_techproducts_config configset, can created one
>> configset. Then I modified the solrconfig.xml in there, both for the Target
>> and Source side. And then created the collection, and I get the errors. I
>> get the error if I create a new collection or try to reload an existing
>> collection after the solrconfig update.
>> Attached is the log and configs.
>> Thanks
>>
>> Satvinder Singh
>>
>>
>>
>> Security Systems Engineer
>> satvinder.si...@nc4.com
>> 703.682.6000 x276 direct
>> 703.989.8030 cell
>> www.NC4.com
>>
>>
>>
>>
>>
>>
>>
>> -Original Message-
>> From: Renaud Delbru [mailto:renaud@siren.solutions]
>> Sent: Monday, May 16, 2016 11:45 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Need Help with Solr 6.0 Cross Data Center Replication
>>
>> Hi,
>>
>> I have tried to reproduce the problem, but was unable to.
>> I have downloaded the Solr 6.0 distribution, added to the solr config the
>> cdcr request handler and modified the update handler to register the
>> CdcrUpdateLog, then start Solr in cloud mode and created a new collection
>> using my solr config. The cdcr request handler starts properly and does not
>> complain about the update log.
>>
>> Could you provide more background on how to reproduce the issue ? E.g.,
>> how do you create a new collection with the cdcr configuration.
>> Are you trying to configure CDCR on collections that were created prior
>> to the CDCR configuration ?
>>
>> @Erik: I have noticed a small issue in the CDCR page of the reference
>> guide. In the code snippet in Configuration -> Source Configuration, the
>>  

Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-19 Thread Steve Rowe
Peter,

It’s an interesting idea.  Could you make a Solr JIRA?

I don’t know where the field type specification would go, but providing a 
mechanism to specify field type for previously non-existent fields, outside of 
the field names themselves, seems useful.

In the meantime, do you know about field aliasing?  

1. You can get results back that rename fields to whatever you want: see the 
section “Field Name Aliases” here: 
.

2. On the query side, eDisMax can perform aliasing so that user-specified field 
names in queries get mapped to one or more indexed fields: look for “alias” in 
.

--
Steve
www.lucidworks.com

> On May 19, 2016, at 4:43 AM, Horváth Péter Gergely 
>  wrote:
> 
> Hi Steve,
> 
> Yes, I know the schema API, however I do not want to specify the field type
> problematically for every single field.
> 
> I would like to be able to specify the field type when it is being added
> (similar to the name postfixes, but without affecting the field names).
> 
> Thanks,
> Peter
> 
> 
> 2016-05-17 17:08 GMT+02:00 Steve Rowe :
> 
>> Hi Peter,
>> 
>> Are you familiar with the Schema API?: <
>> https://cwiki.apache.org/confluence/display/solr/Schema+API>
>> 
>> You can use it to create fields, field types, etc. prior to ingesting your
>> data.
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On May 17, 2016, at 11:05 AM, Horváth Péter Gergely <
>> peter.gergely.horv...@gmail.com> wrote:
>>> 
>>> Hi All,
>>> 
>>> By default Solr allows you to define the type of a dynamic field by
>>> appending a post-fix to the name itself. E.g. creating a color_s field
>>> instructs Solr to create a string field. My understanding is that if we
>> do
>>> this, all queries must refer the post-fixed field name as well. So
>>> instead of a query like color:"red", we will have to write something like
>>> color_s:"red" -- and so on for other field types as well.
>>> 
>>> I am wondering if it is possible to specify the data type used for a
>> field
>>> in Solr 6.0.0, without having to modify the field name. (Or at least in a
>>> way that would allow us to use the original field name) Do you have any
>>> idea, how to achieve this? I am fine, if we have to specify the field
>> type
>>> during the insertion of a document, however, I do not want to keep using
>>> post-fixes while running queries...
>>> 
>>> Thanks,
>>> Peter
>> 
>> 



Re: Sorting on child document field.

2016-05-19 Thread Pranaya Behera

Example would be:
Lets say that I have a product document with regular fields as name, 
price, desc, is_parent. it has child documents such as

CA:: fields as a,b,c,rank
and another child document as
CB:: fields as  x,y,z.
I am using the query where {!parent which="is_parent:true"}a:some AND 
b:somethingelse , here only CA child document is used for searching no 
other child document has been touched. this CA has rank field. I want to 
sort the parents using this field.
Product contains multiple CA documents. But the query matches only one 
document exactly.


On Thursday 19 May 2016 04:09 PM, Pranaya Behera wrote:
While searching in the lucene code base I found 
/ToParentBlockJoinSortField /but its not in the solr or even in solrj 
as well. How would I use it with solrj as I can't find anything to 
query it through the UI.


On Thursday 19 May 2016 11:29 AM, Pranaya Behera wrote:

Hi,

 How can I sort the results i.e. from a block join parent query 
using the field from child document field ?


Thanks & Regards

Pranaya Behera







Re: Need Help with Solr 6.0 Cross Data Center Replication

2016-05-19 Thread Renaud Delbru

Hi Abdel,

have you reloaded the collection [1] after uploading the configuration 
to zookeeper ?


[1] 
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2


--
Renaud Delbru

On 16/05/16 17:29, Abdel Belkasri wrote:

Thanks Renaud.

Here is my setup:

1- I have created 2 sites: Main (source) and DR (traget).
2- Both sites are the same before configuring CDCR
3- The collections (source and target) are created before configuring CDCR
4- collections are created using interactive mode: accepting most defaults
except the ports (gettingstarted collection)
5- I have a zookeeper ensemble too.
6- I change the solrconfig.xml, then I upload using the command:
# upload configset to zookeeper
zkcli.bat -cmd upconfig -zkhost  localhost:2181 -confname gettingstarted
-solrhome C:\solr\solr-6-cloud\solr-6.0.0 -confdir
C:\solr\solr-6-cloud\solr-6.0.0\server\solr\configsets\basic_configs\conf

Renaud can you send your confi files...

Thanks,
--Abdel.

On Mon, May 16, 2016 at 12:16 PM, Satvinder Singh 
wrote:


Thank you.

To summarize this is what I have, all VMS running on Centos7 :

Source Side
 |___ 1 VM running 3 Zookeeper instances on port 2181, 2182 and
2183 (ZOOKEEPER 3.4.8)(Java 1.8.0_91)
 |___ 1 VM running 2 solr 6.0 instances on port 8501, 8502 (Solr
6.0) (Java 1.8.0_91)
 |___ sample_techproducts_config copied as 'liferay', and used to
create collections, that is where I am
  modifying the solrconfig.xml


Target Side
 |___ 1 VM running 3 Zookeeper instances on port 2181, 2182 and
2183 (ZOOKEEPER 3.4.8)(Java 1.8.0_91)
 |___ 1 VM running 2 solr 6.0 instances on port 8501, 8502 (Solr
6.0) (Java 1.8.0_91)
 |___ sample_techproducts_config copied as 'liferay', and used to
create collections, that is where I am
  modifying the solrconfig.xml


Thanks
Satvinder Singh
Security Systems Engineer
satvinder.si...@nc4.com
703.682.6000 x276 direct
703.989.8030 cell
www.NC4.com








-Original Message-
From: Renaud Delbru [mailto:renaud@siren.solutions]
Sent: Monday, May 16, 2016 11:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Need Help with Solr 6.0 Cross Data Center Replication

Thanks Satvinder,
Tomorrow, I'll try to reproduce the issue with your steps and will let you
know.

Regards
--
Renaud Delbru

On 16/05/16 16:53, Satvinder Singh wrote:

Hi,

So the way I am doing it is, for both for the Target and Source side, I

took a copy of the sample_techproducts_config configset, can created one
configset. Then I modified the solrconfig.xml in there, both for the Target
and Source side. And then created the collection, and I get the errors. I
get the error if I create a new collection or try to reload an existing
collection after the solrconfig update.

Attached is the log and configs.
Thanks

Satvinder Singh



Security Systems Engineer
satvinder.si...@nc4.com
703.682.6000 x276 direct
703.989.8030 cell
www.NC4.com








-Original Message-
From: Renaud Delbru [mailto:renaud@siren.solutions]
Sent: Monday, May 16, 2016 11:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Need Help with Solr 6.0 Cross Data Center Replication

Hi,

I have tried to reproduce the problem, but was unable to.
I have downloaded the Solr 6.0 distribution, added to the solr config

the cdcr request handler and modified the update handler to register the
CdcrUpdateLog, then start Solr in cloud mode and created a new collection
using my solr config. The cdcr request handler starts properly and does not
complain about the update log.


Could you provide more background on how to reproduce the issue ? E.g.,

how do you create a new collection with the cdcr configuration.

Are you trying to configure CDCR on collections that were created prior

to the CDCR configuration ?


@Erik: I have noticed a small issue in the CDCR page of the reference

guide. In the code snippet in Configuration -> Source Configuration, the
 element is nested within the .


Thanks
Regards
--
Renaud Delbru

On 15/05/16 23:13, Abdel Belkasri wrote:

Erick,

I tried the new configuration. The same issue that Satvinder is
having. The log updater cannot be instantiated...

class="solr.CdcrUpdateLog"

for some reason that class is causing a problem!

Anyway, anyone has a config that works?

Regards,
--Abdel

On Fri, May 13, 2016 at 11:57 AM, Erick Erickson

wrote:


I changed the CDCR doc, Oliver could you take a glance and see if it
is clear now? All I changed was the sample solrconfig sections

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=626
8
7462

Thanks,
Erick

On Fri, May 13, 2016 at 6:23 AM, Oliver Rudolph
 wrote:

Hi,

I had the same problem. The documentation is kind of missleading here.

You

must not add a new  element to your config but
update the existing . All you need to do is add the
class="solr.CdcrUpdateLog" element to the  element
inside your existing . Hope 

Creating dynamic field, but strip the type indicator postfix from the name

2016-05-19 Thread Horváth Péter Gergely
Hi Everyone,

I am wondering if it is possible to store dynamic fields without the type
indicator postfix. In our Solr environment, I would like to
1.) use dynamic fields ("data drivern collections" with no fixed fields
specified in advance)
2.) be able to specify the field type, but without interacting with Solr
schema API separately
3.) I do not want type indicator postfixes to appear in field names, since
that would make queries more complicated to form.

For example, I could imagine that when I add a document like this:


CloudSolrClient solr  = ...

SolrInputDocument document = new SolrInputDocument();

document.addField("foobar_s", "theValue")

solr.add("someCollection", document);

Solr would recognize that the "foobar_s" field name contains a type
indicator and it would create a field named "foobar" with the type of
string.

Is there any way to achieve this behavior (have Solr create the field with
the requested type, but without the type indicator postfix)? If yes, how
could I do that?

Thanks,
Peter


RE: Need Help with Solr 6.0 Cross Data Center Replication

2016-05-19 Thread Satvinder Singh
Hi,

So this is what I did:

I created solr as a service. Below are the steps I followed for that:--

$ tar xzf solr-X.Y.Z.tgz solr-X.Y.Z/bin/install_solr_service.sh 
--strip-components=2

$ sudo bash ./install_solr_service.sh solr-X.Y.Z.tgz -i /opt/solr1 -d 
/var/solr1 -u solr -s solr1 -p 8501
$ sudo bash ./install_solr_service.sh solr-X.Y.Z.tgz -i /opt/solr2 -d 
/var/solr2 -u solr -s solr2 -p 8502

Then to start it in cloud I modified the solr1.cmd.in and solr2.cmd.in in 
/etc/defaults/
I added ZK_HOST=192.168.56.103:2181,192.168.56.103:2182,192.168.56.103:2183 
(192.168.56.103 is where my 3 zookeeper instances are)

Then I started the 2 solr services solr1 and solr2

Then I created the configset
/bin/solr zk -upconfig -z 
192.168.56.103:2181,192.168.56.103:2182,192.168.56.103:2183 -n Liferay -d 
server/solr/configsets/sample_techproducts_configs/conf

Then I created the collection using:
http://192.168.56.101:8501/solr/admin/collections?action=CREATE=dingdong=1=2=liferay
This created fine

Then I deleted the solrconfig.xml from the zookeeper Liferay configset

Then I uploaded the new solrconfig.xml to the configset. 

When when I do a reload on the collections I get the error. Or I created a new 
collection I get the error.

Thanks

Satvinder Singh
 
 
 
Security Systems Engineer
satvinder.si...@nc4.com
703.682.6000 x276 direct
703.989.8030 cell
www.NC4.com
 
 

  
 



-Original Message-
From: Renaud Delbru [mailto:renaud@siren.solutions] 
Sent: Thursday, May 19, 2016 7:13 AM
To: solr-user@lucene.apache.org
Subject: Re: Need Help with Solr 6.0 Cross Data Center Replication

I have reproduced your steps and the cdcr request handler started successfully. 
I have attached to this mail the config sets I have used. 
It is simply the sample_techproducts_config configset with your solrconfig.xml.

I have used solr 6.0.0 with the following commands:

$ ./bin/solr start -cloud

$ ./bin/solr create_collection -c test_cdcr -d cdcr_configs

Connecting to ZooKeeper at localhost:9983 ...
Uploading /solr-6.0.0/server/solr/configsets/cdcr_configs/conf for config 
test_cdcr to ZooKeeper at localhost:9983

Creating new collection 'test_cdcr' using command:
http://localhost:8983/solr/admin/collections?action=CREATE=test_cdcr=1=1=1=test_cdcr

{
   "responseHeader":{
 "status":0,
 "QTime":5765},
   "success":{"127.0.1.1:8983_solr":{
   "responseHeader":{
 "status":0,
 "QTime":4426},
   "core":"test_cdcr_shard1_replica1"}}}

$ curl http://localhost:8983/solr/test_cdcr/cdcr?action=STATUS



03stoppedenabled 



The difference is that I have used the embedded zookeeper, not a separate 
ensemble.

Could you please provide the commands you used to create the collection ?

Kind Regards
--
Renaud Delbru

On 16/05/16 16:55, Satvinder Singh wrote:
> I also am using a zk ensemble with 3 nodes on each side.
>
> Thanks
>
>
> Satvinder Singh
>
>
>
> Security Systems Engineer
> satvinder.si...@nc4.com
> 703.682.6000 x276 direct
> 703.989.8030 cell
> www.NC4.com
>
>
>
>
>
> 
>
>
> -Original Message-
> From: Satvinder Singh [mailto:satvinder.si...@nc4.com]
> Sent: Monday, May 16, 2016 11:54 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Need Help with Solr 6.0 Cross Data Center Replication
>
> Hi,
>
> So the way I am doing it is, for both for the Target and Source side, I took 
> a copy of the sample_techproducts_config configset, can created one 
> configset. Then I modified the solrconfig.xml in there, both for the Target 
> and Source side. And then created the collection, and I get the errors. I get 
> the error if I create a new collection or try to reload an existing 
> collection after the solrconfig update.
> Attached is the log and configs.
> Thanks
>
> Satvinder Singh
>
>
>
> Security Systems Engineer
> satvinder.si...@nc4.com
> 703.682.6000 x276 direct
> 703.989.8030 cell
> www.NC4.com
>
>
>
>
>
> 
>
>
> -Original Message-
> From: Renaud Delbru [mailto:renaud@siren.solutions]
> Sent: Monday, May 16, 2016 11:45 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Need Help with Solr 6.0 Cross Data Center Replication
>
> Hi,
>
> I have tried to reproduce the problem, but was unable to.
> I have downloaded the Solr 6.0 distribution, added to the solr config the 
> cdcr request handler and modified the update handler to register the 
> CdcrUpdateLog, then start Solr in cloud mode and created a new collection 
> using my solr config. The cdcr request handler starts properly and does not 
> complain about the update log.
>
> Could you provide more background on how to reproduce the issue ? E.g., how 
> do you create a new collection with the cdcr configuration.
> Are you trying to configure CDCR on collections that were created prior to 
> the CDCR configuration ?
>
> @Erik: I have noticed a small issue in the CDCR page of the reference guide. 
> In the code snippet in Configuration -> Source Configuration, the 
>  element is nested within the .
>
> Thanks
> Regards

Re: SolrCloud 6 Join Stream and pagination

2016-05-19 Thread Joel Bernstein
Currently there isn't a way to page the results. There is a jira ticket to
support this for SQL:

https://issues.apache.org/jira/browse/SOLR-9078

This will be implemented first as a Streaming Expression, so it's likely to
be available soon. It should be straightforward to implement the offset()
function if you feel like working up a patch.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, May 19, 2016 at 2:32 AM, Roshan Kamble <
roshan.kam...@smartstreamrdu.com> wrote:

> Hello,
>
> I am using Solr 6 in cloud mode.
> In order to search within different collections I am using
> InnerJoinStream. (using qt=export in order to get correct result)
>
> Is there any way to get paginated result?
>
>
> Regards,
> Roshan
> 
> The information in this email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this email
> by anyone else is unauthorised. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful.
>


Determine Containing Handler

2016-05-19 Thread Max Bridgewater
Hi,

I am implementing a component that needs to redirect calls to the handler
that originally called it. Say the call comes to handler /search, the
component would then do some processing and, alter the query and then send
the query back to /search again.

It works great. The only issue is that the handler is not always called
/search, leading me to have to force people to pass the handler name as
parameter to the component, which is not ideal.

The question thus is: is there a way to find out what handler a component
was invoked from?

I checked in SolrCore and SolrQueryRequest I can't seem to find a method
that would do this.

Thanks,
Max.


Re: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Dmitry Kan
On query side, right above SOF there is SynonymFilter (SF is not present on
indexing). It does the following:
organization -> organization, organisation

SOF turns this pair into: organiz, organis.


On Thu, May 19, 2016 at 2:18 PM, Markus Jelsma 
wrote:

> Hello - is there a KeywordRepeatFilterFactory above the StemmrOverride?
> That would explain it.
> M.
>
>
>
> -Original message-
> > From:Dmitry Kan 
> > Sent: Thursday 19th May 2016 13:08
> > To: solr-user@lucene.apache.org
> > Subject: Re: puzzling StemmerOverrideFilterFactory
> >
> > Hi,
> >
> > Yes, I have checked the analysis page and there everything is logical,
> > stemming is done as expected. So by analysis page the search should not
> > return anything.
> >
> > On Thu, May 19, 2016 at 12:14 PM, Markus Jelsma <
> markus.jel...@openindex.io>
> > wrote:
> >
> > > Hello - that sounds odd indeed. Did you check query and indexing
> analysis?
> > > M.
> > >
> > >
> > >
> > > -Original message-
> > > > From:Dmitry Kan 
> > > > Sent: Thursday 19th May 2016 9:36
> > > > To: solr-user@lucene.apache.org
> > > > Subject: puzzling StemmerOverrideFilterFactory
> > > >
> > > > Hello!
> > > >
> > > > Puzzling case: there is a  > > class="solr.StemmerOverrideFilterFactory"
> > > > dictionary="stemdict.txt" /> on query side, but not indexing. One
> rule is
> > > > mapping organization onto organiz (on query). On indexing
> > > > SnowballPorterFilterFactory will stem organization to organ. Still
> > > > searching with organization finds it in the index. Anybody has an
> idea
> > > why
> > > > this happens?
> > > >
> > > > This is on solr 4.10.2.
> > > >
> > > > Thanks,
> > > > Dmitry
> > > >
> > > > --
> > > > Dmitry Kan
> > > > Luke Toolbox: http://github.com/DmitryKey/luke
> > > > Blog: http://dmitrykan.blogspot.com
> > > > Twitter: http://twitter.com/dmitrykan
> > > > SemanticAnalyzer: www.semanticanalyzer.info
> > > >
> > >
> >
> >
> >
> > --
> > Dmitry Kan
> > Luke Toolbox: http://github.com/DmitryKey/luke
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> > SemanticAnalyzer: www.semanticanalyzer.info
> >
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


RE: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Markus Jelsma
Hello - is there a KeywordRepeatFilterFactory above the StemmrOverride? That 
would explain it.
M.

 
 
-Original message-
> From:Dmitry Kan 
> Sent: Thursday 19th May 2016 13:08
> To: solr-user@lucene.apache.org
> Subject: Re: puzzling StemmerOverrideFilterFactory
> 
> Hi,
> 
> Yes, I have checked the analysis page and there everything is logical,
> stemming is done as expected. So by analysis page the search should not
> return anything.
> 
> On Thu, May 19, 2016 at 12:14 PM, Markus Jelsma 
> wrote:
> 
> > Hello - that sounds odd indeed. Did you check query and indexing analysis?
> > M.
> >
> >
> >
> > -Original message-
> > > From:Dmitry Kan 
> > > Sent: Thursday 19th May 2016 9:36
> > > To: solr-user@lucene.apache.org
> > > Subject: puzzling StemmerOverrideFilterFactory
> > >
> > > Hello!
> > >
> > > Puzzling case: there is a  > class="solr.StemmerOverrideFilterFactory"
> > > dictionary="stemdict.txt" /> on query side, but not indexing. One rule is
> > > mapping organization onto organiz (on query). On indexing
> > > SnowballPorterFilterFactory will stem organization to organ. Still
> > > searching with organization finds it in the index. Anybody has an idea
> > why
> > > this happens?
> > >
> > > This is on solr 4.10.2.
> > >
> > > Thanks,
> > > Dmitry
> > >
> > > --
> > > Dmitry Kan
> > > Luke Toolbox: http://github.com/DmitryKey/luke
> > > Blog: http://dmitrykan.blogspot.com
> > > Twitter: http://twitter.com/dmitrykan
> > > SemanticAnalyzer: www.semanticanalyzer.info
> > >
> >
> 
> 
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
> 


Suspicious message with attachment

2016-05-19 Thread help
The following message addressed to you was quarantined because it likely 
contains a virus:

Subject: Re: Need Help with Solr 6.0 Cross Data Center Replication
From: Renaud Delbru 

However, if you know the sender and are expecting an attachment, please reply 
to this message, and we will forward the quarantined message to you.


Re: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Dmitry Kan
Hi,

Yes, I have checked the analysis page and there everything is logical,
stemming is done as expected. So by analysis page the search should not
return anything.

On Thu, May 19, 2016 at 12:14 PM, Markus Jelsma 
wrote:

> Hello - that sounds odd indeed. Did you check query and indexing analysis?
> M.
>
>
>
> -Original message-
> > From:Dmitry Kan 
> > Sent: Thursday 19th May 2016 9:36
> > To: solr-user@lucene.apache.org
> > Subject: puzzling StemmerOverrideFilterFactory
> >
> > Hello!
> >
> > Puzzling case: there is a  class="solr.StemmerOverrideFilterFactory"
> > dictionary="stemdict.txt" /> on query side, but not indexing. One rule is
> > mapping organization onto organiz (on query). On indexing
> > SnowballPorterFilterFactory will stem organization to organ. Still
> > searching with organization finds it in the index. Anybody has an idea
> why
> > this happens?
> >
> > This is on solr 4.10.2.
> >
> > Thanks,
> > Dmitry
> >
> > --
> > Dmitry Kan
> > Luke Toolbox: http://github.com/DmitryKey/luke
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> > SemanticAnalyzer: www.semanticanalyzer.info
> >
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


Re: Sorting on child document field.

2016-05-19 Thread Pranaya Behera
While searching in the lucene code base I found 
/ToParentBlockJoinSortField /but its not in the solr or even in solrj as 
well. How would I use it with solrj as I can't find anything to query it 
through the UI.


On Thursday 19 May 2016 11:29 AM, Pranaya Behera wrote:

Hi,

 How can I sort the results i.e. from a block join parent query 
using the field from child document field ?


Thanks & Regards

Pranaya Behera





Re: Highlighting phone numbers

2016-05-19 Thread marotosg
Thanks. Using the debug query returns the info I need.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-phone-numbers-tp4277491p4277712.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [scottchu] How to specify multiple zk nodes using solrstartcommand under Windows

2016-05-19 Thread scott.chu

I find a way to stop zk nodes peacefully under Windows.

If you start zk nodes in order of 1,2,3
then you should stop them in reverse order, i.e. 3,2,1
Thus you can stop process peacefully without any errors.

Hope this help people!

scott.chu,scott@udngroup.com
2016/5/19 (週四)
- Original Message - 
From: John Bickerstaff 
To: solr-user ; scott(自己) 
CC: 
Date: 2016/5/18 (週三) 13:53
Subject: Re: [scottchu] How to specify multiple zk nodes using solrstartcommand 
under Windows


I think those zk server warning messages are expected. Until you have 3 
running instances you don't have a "Quorum" and the Zookeeper instances 
complain. Once the third one comes up they are "happy" and don't complain 

any more. You'd get similar messages if one of the Zookeeper nodes ever 
went down. 

As for the stopping of zk server - I've never had any problem issuing a 
stop command, but I'm running Linux so I may not be much good to you in 
that regard. 

On Tue, May 17, 2016 at 8:41 PM, scott.chu  wrote: 


> I tested yesterday and it proves my theory. I'll share what I do under 
> Windows on 1 PC here with you experienced guys and further newbies: 
> 
> 1>Download zookeeper 3.4.8. I unzip it and copy to 3 other different 
> folders: zk_1, zk_2, zk_3. 
> 2>For each zk_n folder, I do these things (Note: {n} means the last digit 
> in zk_n foler name): 
> a. Create zoo_data folder under root and create 'myid' with 
> notepad, the contents is pure '{n}' only. 
> b. Create zoo.cfg under conf folder with following contents: 
> clientPort=218{n} 
> initLimit=5 
> syncLimit=2 
> dataDir=D:/zk_{n}/zoo_data 
> ;if p2p-coneect-port or leader-election-port are all 
> same, then we should set maxClientCnxns=n 
> ;maxClientCnxns=3 
> ;server.x=host:p2p-connect-port:leader-election-port 
> server.1=localhost:2888:3888 
> server.2=localhost:2889:3889 
> server.3=localhost:2890:3890 
> 3> I download ZOOKEEPER-1122's zkServer.cmd. and go into each zk_n folder 
> and issue command: 
> bin\zkServer.cmd start 
> 
> [Question]: There's something I'd like to ask guys: When I start 
> zk_1, zk_2, the console keeps shows some warning messages. 
> Only after I start zk_3, the warning messages 
> is stopped. Is that normal? 
> 
> 4> I use zkui_win to see them all go online successfully. 
> 5> I goto Solr-5.4.1 folder, and issue following commands: 
> bin\solr start -c -s mynodes\node1 -z localhost:2181 
> bin\solr start -c -s mynodes\node1 -z localhost:2181 -p 
> 7973 
> bin\solr create -c cugna -d myconfigsets\cugna -shards 
> 1 -replicationFactor 2 -p 8983 
> 6> By using zkui_win again, I see: 
> ** config 'cugna' are all synchronized on zk_1 to zk_3. So this 
> proves my theory, we only have to specify only one zk nodes and they'll 

> sync themselves. ** 
> 
> [Question]: I go into zk_n folder and issue 'bin\zkServer stop'. However, 
> this shows error message. It seems it can't taskkill the zk process for 
> some reason. The only way I stop them 
> is by closing DOS windows that has issued the 
> 'bin\zkServer start' command. Does anybody know why 'bin\zkServer stop' 
> doesn't work? 
> 
> Note: Gotta say sorry for the repitition of localhost:2181. It's my typo. 
> 
> scott.chu,scott@udngroup.com 
> 2016/5/18 (週三) 
> - Original Message - 
> From: Abdel Belkasri 
> To: solr-user 
> CC: 
> Date: 2016/5/18 (週三) 00:17 
> Subject: Re: [scottchu] How to specify multiple zk nodes using solr 
> startcommand under Windows 
> 
> 
> The repetition is just a cut and paste from Scott's post. 
> 
> How can I check if I am getting the ensemble or just a single zk? 
> 
> Also if this is not the way to specify an ensemble, what is the right way? 
> 
> 
> Because the comma delimited list does not work, I concur with Scott. 
> 
> On Tue, May 17, 2016 at 11:49 AM, Erick Erickson  
> 
> wrote: 
> 
> > Are you absolutely sure you're getting an _ensemble_ and 
> > not just connecting to a single node? My suspicion (without 
> > proof) is that you're just getting one -z option. It'll work as 
> > long as that ZK instance stays up, but it won't be fault-tolerant. 
> > 
> > And again you repeated the port (2181) twice. 
> > 
> > Best, 
> > Erick 
> > 
> > On Tue, May 17, 2016 at 8:02 AM, Abdel Belkasri  
> > wrote: 
> > > Hi Scott, 
> > > what worked for me in Windows is this (no ",") 
> > > bin\Solr start -c -s mynodes\node1 -z localhost:2181 -z localhost:2181 
> -z 
> > > localhost:2183 
> > > 
> > > -- Hope this helps 
> > > Abdel. 
> > > 
> > > On Tue, May 17, 2016 at 3:35 AM, scott.chu  
> > wrote: 
> > > 
> > >> I start 3 zk nodes at port 2181,2182, and 2183 on my local machine. 

> > >> Go into Solr 5.4.1 root folder and issue and issue the command in 
> > article 
> > >> 'Setting Up an External ZooKeeper Ensemble' in reference guide 
> > >> 
> > >> bin\Solr start -c -s mynodes\node1 -z 
> > >> localhost:2181,localhost:2181,localhost:2183 

Hostname of Solr server in Velocity template?

2016-05-19 Thread Alex Ott
Hello

I have the code that need to work with multiple Solr instances.
Code receives the REST API call initiated from the UI generated by Solr
from Velocity template, and I need to specify the hostname of the server
that initiated this call.  But I can't find what parameter I can use to get
the hostname of current Solr instance.  I can of course write the
Javascript to handle this task, but maybe there is builtin Velocity
property that I can ask for host & port of current server?

Thank you

-- 
With best wishes,Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)
Skype: alex.ott


RE: puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Markus Jelsma
Hello - that sounds odd indeed. Did you check query and indexing analysis?
M.

 
 
-Original message-
> From:Dmitry Kan 
> Sent: Thursday 19th May 2016 9:36
> To: solr-user@lucene.apache.org
> Subject: puzzling StemmerOverrideFilterFactory
> 
> Hello!
> 
> Puzzling case: there is a  dictionary="stemdict.txt" /> on query side, but not indexing. One rule is
> mapping organization onto organiz (on query). On indexing
> SnowballPorterFilterFactory will stem organization to organ. Still
> searching with organization finds it in the index. Anybody has an idea why
> this happens?
> 
> This is on solr 4.10.2.
> 
> Thanks,
> Dmitry
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
> 


Re: Specifying dynamic field type without polluting actual field names with type indicators

2016-05-19 Thread Horváth Péter Gergely
Hi Steve,

Yes, I know the schema API, however I do not want to specify the field type
problematically for every single field.

I would like to be able to specify the field type when it is being added
(similar to the name postfixes, but without affecting the field names).

Thanks,
Peter


2016-05-17 17:08 GMT+02:00 Steve Rowe :

> Hi Peter,
>
> Are you familiar with the Schema API?: <
> https://cwiki.apache.org/confluence/display/solr/Schema+API>
>
> You can use it to create fields, field types, etc. prior to ingesting your
> data.
>
> --
> Steve
> www.lucidworks.com
>
> > On May 17, 2016, at 11:05 AM, Horváth Péter Gergely <
> peter.gergely.horv...@gmail.com> wrote:
> >
> > Hi All,
> >
> > By default Solr allows you to define the type of a dynamic field by
> > appending a post-fix to the name itself. E.g. creating a color_s field
> > instructs Solr to create a string field. My understanding is that if we
> do
> > this, all queries must refer the post-fixed field name as well. So
> > instead of a query like color:"red", we will have to write something like
> > color_s:"red" -- and so on for other field types as well.
> >
> > I am wondering if it is possible to specify the data type used for a
> field
> > in Solr 6.0.0, without having to modify the field name. (Or at least in a
> > way that would allow us to use the original field name) Do you have any
> > idea, how to achieve this? I am fine, if we have to specify the field
> type
> > during the insertion of a document, however, I do not want to keep using
> > post-fixes while running queries...
> >
> > Thanks,
> > Peter
>
>


Re: Hierarchial Support - Solr

2016-05-19 Thread Charlie Hull

On 18/05/2016 17:55, thiaga rajan wrote:



Hi Team,
We are exploring solr for one of our project as a search engine. It was a 
really a great tool around indexing and response time. While we are exploring 
we got the below questions and understandings. Kindly confirm the same.

We are actually trying to implement the search engine for a hierarchical 
search. (Tree structure). We have flatten our data structure and exported the 
data in to solr as solr is more meant for flat structure in terms of request 
and response
Example   1   -11---111,112,113   
--12--121   --13-131,132,133
Our data structure will resemble something like the below and exported the same 
in Solr. Now when the customer enters the key for search, we need to search in 
the below structure. Each of the row in the below table corresponds to the each 
of the document in Solr.
We were able to achieve this in the below structure but need to confirm on the 
below items.
1. Search the Level with the keyword and send only the parent and not the 
children of the node
Example - if the user enters 12, then we need only 1,12 and not the children(ie 
121)
I assume we dont have a choice to achieve with Solr and we need to write a 
custom implementation for this. Correct me if i am wrong.
2. We need to do the distinct selection for each of the document. Example while 
if the search keyword is 11, then i should send 1,11(Kind of select distinct 
from sql Select distinct L1,L2)) I have read various forums on this and looks 
like we have options around facet, grouping.
But not exactly the same which we are asking for similar to the Sql distinct. 
If we are not able to get the distinct results, how we will apply the 
pagination. Example - if the page size is 1 to 10, if we have 5 duplicate 
documents, then if we take a distinct of those, it might fail as still we have 
space for another 5 records.  We understand faceting is arranging the results 
based on a faceted field rather than giving the results in the requested way.
Please confirm of our understanding on the above options. Thanks
Document structure below


Hi,

This sounds a bit like the Siren implementation 
https://siren.solutions/siren/faqs/ where the heirarchical structure is 
encoded in a document. I'm pretty sure this is a commercial add-on though.


You might also take a look at the open source ontology indexer for Solr 
we developed as part of the BioSolr project:

https://github.com/flaxsearch/BioSolr/tree/master/ontology

Regards

Charlie




| L1 | L2 | L3 |
| 1 | 11 | 111 |
| 1 | 11 | 112 |
| 1 | 11 | 113 |
| 1 | 12 | 121 |
| 1 | 13 | 131 |
| 1 | 13 | 132 |
| 1 | 13 | 133 |








--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


puzzling StemmerOverrideFilterFactory

2016-05-19 Thread Dmitry Kan
Hello!

Puzzling case: there is a  on query side, but not indexing. One rule is
mapping organization onto organiz (on query). On indexing
SnowballPorterFilterFactory will stem organization to organ. Still
searching with organization finds it in the index. Anybody has an idea why
this happens?

This is on solr 4.10.2.

Thanks,
Dmitry

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info


SolrCloud 6 Join Stream and pagination

2016-05-19 Thread Roshan Kamble
Hello,

I am using Solr 6 in cloud mode.
In order to search within different collections I am using InnerJoinStream. 
(using qt=export in order to get correct result)

Is there any way to get paginated result?


Regards,
Roshan

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.


How to stop searches to solr while full data import is going in SOLR

2016-05-19 Thread preeti kumari
Hi,

I am using solr 5.2.1. I have two clusters Primary A and Primary B.
I was pinging servers to check whether they are up or not to route the
searches to working cluster A or B.

But while I am running Full data import in Primary cluster A , There is not
all the data and pinging servers will not help as my solr servers would be
responding.

But I want my searches to go to Cluster B instead of A.

Please help me with a way from solr which can say solr not ready to support
searches as full data import is running there.

Thanks
Preeti


Re: Sub faceting on string field using json facet runs extremly slow

2016-05-19 Thread Vijay Tiwary
Can somebody confirm whether the jira SOLR-8096 will affect json facet
also as I see sub faceting using term facet on string field is ruuning 5x
slower than on integer field for same number of hits and unique terms.
On 17-May-2016 3:33 pm, "Vijay Tiwary"  wrote:

> Below is the request
>
>q=*:*=0=0={
>
> "customer_id": {
>
> type": "terms",
>
> "limit": -1,
>
> "field": "cid_ti",
>
> "mincount": 1,
>
> "facet": {
>
> "contact_s": {
>
> "type":
> "terms",
>
> "limit": 1,
>
> "field":
> "contact_s",
>
>
> "mincount": 1
>
> }
>
>
>
> }
>
> }
>
> }=age_td:[25 TO 50]
>
>
>
>
>
>
>
>
>
> On 17-May-2016 2:20 pm, "chandan khatri"  wrote:
>
>> Can you please share the query for sub faceting?
>>
>> On Tue, May 17, 2016 at 2:13 PM, Vijay Tiwary 
>> wrote:
>>
>>> Hello all,
>>> I have an index of 8 shards having 1 replica each distubuted across 8
>>> node
>>> solr cloud . Size of index is 300 gb having 30 million documents. Solr
>>> json
>>> facet runs extremly slow if I am sub faceting on string field even if
>>> tnumfound is only around 2 (also I am not returning any rows i.e
>>> rows=0).
>>> Is there any way to improve the performance?
>>>
>>> Thanks,
>>> Vijay
>>>
>>
>>


Sorting on child document field.

2016-05-19 Thread Pranaya Behera

Hi,

 How can I sort the results i.e. from a block join parent query 
using the field from child document field ?


Thanks & Regards

Pranaya Behera