[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"

2019-08-05 Thread Alexander S. (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899976#comment-16899976
 ] 

Alexander S. commented on SOLR-6468:


Just wanted to give a small update – we upgraded to Solr 8 over the weekend and 
search seem to be working well. 
[MappingCharFilterFactory|http://lucene.apache.org/core/4_8_1/analyzers-common/org/apache/lucene/analysis/charfilter/MappingCharFilterFactory.html]
 also works. [~steve_rowe], are there any known downsides of replacing the 
StopFilterFactory with the MappinhCharFilterFactory?

> Regression: StopFilterFactory doesn't work properly without deprecated 
> enablePositionIncrements="false"
> ---
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.8.1, 4.9, 5.3.1, 6.6.2, 7.1
>Reporter: Alexander S.
>Priority: Major
> Attachments: FieldValue.png
>
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13293) org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient - Error consuming and closing http response stream.

2019-08-03 Thread Alexander S. (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899459#comment-16899459
 ] 

Alexander S. commented on SOLR-13293:
-

I just upgraded from Solr 5 to 8 and also seeing these errors.

> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient - Error 
> consuming and closing http response stream.
> -
>
> Key: SOLR-13293
> URL: https://issues.apache.org/jira/browse/SOLR-13293
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Affects Versions: 8.0
>Reporter: Karl Stoney
>Priority: Minor
>
> Hi, 
> Testing out branch_8x, we're randomly seeing the following errors on a simple 
> 3 node cluster.  It doesn't appear to affect replication (the cluster remains 
> green).
> They come in (mass, literally 1000s at a time) bulk.
> There we no network issues at the time.
> {code:java}
> 16:53:01.492 [updateExecutor-4-thread-34-processing-x:at-uk_shard1_replica_n1 
> r:core_node3 null n:solr-2.search-solr.preprod.k8.atcloud.io:80_solr c:at-uk 
> s:shard1] ERROR 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient - Error 
> consuming and closing http response stream.
> java.nio.channels.AsynchronousCloseException: null
> at 
> org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:316)
>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
> at java.io.InputStream.read(InputStream.java:101) ~[?:1.8.0_191]
> at 
> org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:287)
>  ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114]
> at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:283)
>  ~[solr-solrj-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT 
> b14748e61fd147ea572f6545265b883fa69ed27f - root
> - 2019-03-04 16:30:04]
> at 
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:176)
>  ~[solr-solrj-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT 
> b14748e61fd147ea572f6545265b883fa69ed27f - root - 2019-03-04
> 16:30:04]
> at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>  ~[metrics-core-3.2.6.jar:3.2.6]
> at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>  ~[solr-solrj-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT 
> b14748e61fd147ea572f6545265b883fa69ed27f - root - 2019-03-04 16:30:04]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_191]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_191]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6769) Election bug

2019-08-01 Thread Alexander S. (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897936#comment-16897936
 ] 

Alexander S. commented on SOLR-6769:


Hi, unfortunately I can't test with the latest versions since we are tied to 
Solr 5. I tuned our caches and didn't see this error any more so let's close 
for now.

> Election bug
> 
>
> Key: SOLR-6769
> URL: https://issues.apache.org/jira/browse/SOLR-6769
> Project: Solr
>  Issue Type: Bug
>Reporter: Alexander S.
>Priority: Major
> Attachments: Screenshot 876.png
>
>
> Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in 
> total).
> What I did is just stopped the shards, but if first shard stopped immediately 
> the second one took about 5 minutes to stop. You can see on the screenshot 
> what happened next. In short:
> 1. Shard 1 stopped normally
> 3. Replica 1 became a leader
> 2. Shard 2 still was performing some job but wasn't accepting connection
> 4. Replica 2 did not became a leader because Shard 2 is still there but 
> doesn't work
> 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a 
> leader
> Marked as critical because this shuts down the entire cluster. Please adjust 
> if I am wrong.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12363) Duplicates with random search, cursors, and fixed seed

2018-05-16 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-12363:

Description: 
We do have a SolrCloud cluster and just updated one of our views to use cursors 
with the random order. Our goal was to use an infinite scroll with the random 
ordering so we can shuffle results once every 24 hours.

To do so we save the seed that we use in our random order to the cookies with 
the 24 hours expiration period, which didn't work as expected:
 # Results are shuffled with every request (every time we pass the initial 
cursor value "*" and the same random value for ordering we already used).
 # Results contain duplicates sometimes. Not a lot of them, but from time to 
time they appear.

In our *schema.xml* we have:
{code:java}

{code}
In our search requests, we order by *random_123 asc, id asc*, where *123* is 
the seed from cookies.

Here is the page [https://awards.wegohealth.com/nominees]

Even when I try to get the "next page" URL from google chrome developer console 
and open it in separate tabs it yields different results: 
[https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]

So it feels like the seed parameter we use is ignored or every shard 
understands it differently, not sure.

On the screenshots, you can see the URL is the same and results are different.

  was:
We do have a SolrCloud cluster and just updated one of our views to use cursors 
with the random order. Our goal was to use an infinite scroll with the random 
ordering so we can shuffle results once every 24 hours.

To do so we save the seed that we use in our random order to the cookies with 
the 24 hours expiration period, which didn't work as expected:
 # Results are shuffled with every request (every time we pass the initial 
cursor value "*" and the same random value for ordering we already used).
 # Results contain duplicates sometimes. Not a lot of them, but from time to 
time they appear.

In our *schema.xml* we have:
{code:java}

{code}
In our search requests, we order by *random_123 asc, id asc*, where *123* is 
the seed from cookies.

Here is the page [https://awards.wegohealth.com/nominees]

-Even when I try to get the "next page" URL from google chrome developer 
console and open it in separate tabs it yields different results: 
[https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]-

So it feels like the seed parameter we use is ignored or every shard 
understands it differently, not sure.

On the screenshots, you can see the URL is the same and results are different.


> Duplicates with random search, cursors, and fixed seed
> --
>
> Key: SOLR-12363
> URL: https://issues.apache.org/jira/browse/SOLR-12363
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.3.1
>Reporter: Alexander S.
>Priority: Major
> Attachments: Screen shot 2018-05-16 at 14.51.19.png, Screen shot 
> 2018-05-16 at 14.51.23.png, Screen shot 2018-05-16 at 14.51.26.png
>
>
> We do have a SolrCloud cluster and just updated one of our views to use 
> cursors with the random order. Our goal was to use an infinite scroll with 
> the random ordering so we can shuffle results once every 24 hours.
> To do so we save the seed that we use in our random order to the cookies with 
> the 24 hours expiration period, which didn't work as expected:
>  # Results are shuffled with every request (every time we pass the initial 
> cursor value "*" and the same random value for ordering we already used).
>  # Results contain duplicates sometimes. Not a lot of them, but from time to 
> time they appear.
> In our *schema.xml* we have:
> {code:java}
> 
>  indexed="true"/>{code}
> In our search requests, we order by *random_123 asc, id asc*, where *123* is 
> the seed from cookies.
> Here is the page [https://awards.wegohealth.com/nominees]
> Even when I try to get the "next page" URL from google chrome developer 
> console and open it in separate tabs it yields different results: 
> [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]
> So it feels like the seed parameter we use is ignored or every shard 
> understands it differently, not sure.
> On the screenshots, you can see the URL is the same and results are different.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12363) Duplicates with random search, cursors, and fixed seed

2018-05-16 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-12363:

Description: 
We do have a SolrCloud cluster and just updated one of our views to use cursors 
with the random order. Our goal was to use an infinite scroll with the random 
ordering so we can shuffle results once every 24 hours.

To do so we save the seed that we use in our random order to the cookies with 
the 24 hours expiration period, which didn't work as expected:
 # Results are shuffled with every request (every time we pass the initial 
cursor value "*" and the same random value for ordering we already used).
 # Results contain duplicates sometimes. Not a lot of them, but from time to 
time they appear.

In our *schema.xml* we have:
{code:java}

{code}
In our search requests, we order by *random_123 asc, id asc*, where *123* is 
the seed from cookies.

Here is the page [https://awards.wegohealth.com/nominees]

-Even when I try to get the "next page" URL from google chrome developer 
console and open it in separate tabs it yields different results: 
[https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]-

So it feels like the seed parameter we use is ignored or every shard 
understands it differently, not sure.

On the screenshots, you can see the URL is the same and results are different.

  was:
We do have a SolrCloud cluster and just updated one of our views to use cursors 
with the random order. Our goal was to use an infinite scroll with the random 
ordering so we can shuffle results once every 24 hours.

To do so we save the seed that we use in our random order to the cookies with 
the 24 hours expiration period, which didn't work as expected:
 # Results are shuffled with every request (every time we pass the initial 
cursor value "*" and the same random value for ordering we already used).
 # Results contain duplicates sometimes. Not a lot of them, but from time to 
time they appear.

In our *schema.xml* we have:
{code:java}

{code}
In our search requests, we order by *random_123 asc, id asc*, where *123* is 
the seed from cookies.

Here is the page [https://awards.wegohealth.com/nominees]

Even when I try to get the "next page" URL from google chrome developer console 
and open it in separate tabs it yields different results: 
[https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]

So it feels like the seed parameter we use is ignored or every shard 
understands it differently, not sure.

On the screenshots, you can see the URL is the same and results are different.


> Duplicates with random search, cursors, and fixed seed
> --
>
> Key: SOLR-12363
> URL: https://issues.apache.org/jira/browse/SOLR-12363
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.3.1
>Reporter: Alexander S.
>Priority: Major
> Attachments: Screen shot 2018-05-16 at 14.51.19.png, Screen shot 
> 2018-05-16 at 14.51.23.png, Screen shot 2018-05-16 at 14.51.26.png
>
>
> We do have a SolrCloud cluster and just updated one of our views to use 
> cursors with the random order. Our goal was to use an infinite scroll with 
> the random ordering so we can shuffle results once every 24 hours.
> To do so we save the seed that we use in our random order to the cookies with 
> the 24 hours expiration period, which didn't work as expected:
>  # Results are shuffled with every request (every time we pass the initial 
> cursor value "*" and the same random value for ordering we already used).
>  # Results contain duplicates sometimes. Not a lot of them, but from time to 
> time they appear.
> In our *schema.xml* we have:
> {code:java}
> 
>  indexed="true"/>{code}
> In our search requests, we order by *random_123 asc, id asc*, where *123* is 
> the seed from cookies.
> Here is the page [https://awards.wegohealth.com/nominees]
> -Even when I try to get the "next page" URL from google chrome developer 
> console and open it in separate tabs it yields different results: 
> [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]-
> So it feels like the seed parameter we use is ignored or every shard 
> understands it differently, not sure.
> On the screenshots, you can see the URL is the same and results are different.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12363) Duplicates with random search, cursors, and fixed seed

2018-05-16 Thread Alexander S. (JIRA)
Alexander S. created SOLR-12363:
---

 Summary: Duplicates with random search, cursors, and fixed seed
 Key: SOLR-12363
 URL: https://issues.apache.org/jira/browse/SOLR-12363
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 5.3.1
Reporter: Alexander S.
 Attachments: Screen shot 2018-05-16 at 14.51.19.png, Screen shot 
2018-05-16 at 14.51.23.png, Screen shot 2018-05-16 at 14.51.26.png

We do have a SolrCloud cluster and just updated one of our views to use cursors 
with the random order. Our goal was to use an infinite scroll with the random 
ordering so we can shuffle results once every 24 hours.

To do so we save the seed that we use in our random order to the cookies with 
the 24 hours expiration period, which didn't work as expected:
 # Results are shuffled with every request (every time we pass the initial 
cursor value "*" and the same random value for ordering we already used).
 # Results contain duplicates sometimes. Not a lot of them, but from time to 
time they appear.

In our *schema.xml* we have:
{code:java}

{code}
In our search requests, we order by *random_123 asc, id asc*, where *123* is 
the seed from cookies.

Here is the page [https://awards.wegohealth.com/nominees]

Even when I try to get the "next page" URL from google chrome developer console 
and open it in separate tabs it yields different results: 
[https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]

So it feels like the seed parameter we use is ignored or every shard 
understands it differently, not sure.

On the screenshots, you can see the URL is the same and results are different.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"

2018-02-14 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363744#comment-16363744
 ] 

Alexander S. commented on SOLR-6468:


I think so, Solr and Lucene versions are different things. Solr 5.3.1 supports 
Lucene version 4.3, but newer versions of Solr probably don't.

But I am not absolutely sure what exactly Solr version dropped support for 
this, just saying that we're on Solr 5.3.1 and it is working, it didn't work in 
Solr 6 for sure (we tried it) and, if I am not mistaken, it didn't work in Solr 
5.5.

> Regression: StopFilterFactory doesn't work properly without deprecated 
> enablePositionIncrements="false"
> ---
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.8.1, 4.9, 5.3.1, 7.1, 6.6.2
>Reporter: Alexander S.
>Priority: Major
> Attachments: FieldValue.png
>
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"

2018-02-13 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363583#comment-16363583
 ] 

Alexander S. commented on SOLR-6468:


Hey, we're on 5.3.1 because of this. AFAIK this doesn't work on newer versions.

> Regression: StopFilterFactory doesn't work properly without deprecated 
> enablePositionIncrements="false"
> ---
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.8.1, 4.9, 5.3.1, 7.1, 6.6.2
>Reporter: Alexander S.
>Priority: Major
> Attachments: FieldValue.png
>
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"

2018-02-11 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-6468:
---
Affects Version/s: 7.1
   6.6.2

> Regression: StopFilterFactory doesn't work properly without deprecated 
> enablePositionIncrements="false"
> ---
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.8.1, 4.9, 5.3.1, 7.1, 6.6.2
>Reporter: Alexander S.
>Priority: Major
> Attachments: FieldValue.png
>
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"

2018-02-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359924#comment-16359924
 ] 

Alexander S. commented on SOLR-6468:


Wondering how we can bring attention to this problem?

> Regression: StopFilterFactory doesn't work properly without deprecated 
> enablePositionIncrements="false"
> ---
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.8.1, 4.9, 5.3.1
>Reporter: Alexander S.
>Priority: Major
> Attachments: FieldValue.png
>
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"

2018-02-11 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-6468:
---
Summary: Regression: StopFilterFactory doesn't work properly without 
deprecated enablePositionIncrements="false"  (was: Regression: 
StopFilterFactory doesn't work properly without 
enablePositionIncrements="false")

> Regression: StopFilterFactory doesn't work properly without deprecated 
> enablePositionIncrements="false"
> ---
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.8.1, 4.9, 5.3.1
>Reporter: Alexander S.
>Priority: Major
> Attachments: FieldValue.png
>
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"

2018-02-11 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-6468:
---
Affects Version/s: 5.3.1

> Regression: StopFilterFactory doesn't work properly without 
> enablePositionIncrements="false"
> 
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 4.8.1, 4.9, 5.3.1
>Reporter: Alexander S.
>Priority: Major
> Attachments: FieldValue.png
>
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11939) Collection API: property.name ignored when creating collections

2018-02-04 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351708#comment-16351708
 ] 

Alexander S. commented on SOLR-11939:
-

Hi Varun,

I am referring to 
[https://lucene.apache.org/solr/guide/6_6/collections-api.html]
|property._name_=_value_|string|No| |Set core property _name_ to _value_. See 
the section [Defining 
core.properties|https://lucene.apache.org/solr/guide/6_6/defining-core-properties.html#defining-core-properties]
 for details on supported properties and values.|

All shards and replicas are created on separate Solr instances so a single name 
for all cores would work in this case.

Well, I started working on core names mostly because the WEB UI (at least in 
5.3.1) doesn't work with collections so I wasn't aware that query requests 
would work with collection names also. Core name doesn't matter that much then 
and we're fine with generic core names.

It would be good to mention this in the docs somewhere.

Best,

Alexander S.

> Collection API: property.name ignored when creating collections
> ---
>
> Key: SOLR-11939
> URL: https://issues.apache.org/jira/browse/SOLR-11939
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.3.1
>Reporter: Alexander S.
>Assignee: Varun Thacker
>Priority: Major
>
> Trying to create a collection this way:
> {code:java}
> /solr/admin/collections?wt=json=CREATE=carmen-test=1=4=shard1,shard2,shard3,shard4=carmen=compositeId=carmen_test{code}
> This appears in the log:
> {code:java}
> OverseerCollectionProcessor.processMessage : create , {
>   "name":"carmen-test",
>   "fromApi":"true",
>   "replicationFactor":"1",
>   "collection.configName":"carmen",
>   "numShards":"4",
>   "shards":"shard1,shard2,shard3,shard4",
>   "stateFormat":"2",
>   "property.name":"carmen_test",
>   "router.name":"compositeId",
>   "operation":"create"}{code}
> But the resulting core name is *carmen-test_shard1_replica1* matching 
> "collection name" + sharn name + replica number.
> How can I set a custom core name when creating a collection?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11939) Collection API: property.name ignored when creating collections

2018-02-02 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350473#comment-16350473
 ] 

Alexander S. edited comment on SOLR-11939 at 2/2/18 3:07 PM:
-

Found this discussion 
[http://lucene.472066.n3.nabble.com/Core-property-name-ignored-when-creating-collection-using-API-td4183405.html]

It seems like I don't have to worry about the core name as it seems Solr is 
mowing towards collections.

UPD. But this is still a discrepancy between the docs and the API. I've spent 
an hour figuring this out, patching a Chef's cookbook adding these properties 
and figured out that this doesn't work as described in the docs. 


was (Author: aheaven):
Found this discussion 
[http://lucene.472066.n3.nabble.com/Core-property-name-ignored-when-creating-collection-using-API-td4183405.html]

It seems like I don't have to worry about the core name as it seems Solr is 
mowing towards collections.

> Collection API: property.name ignored when creating collections
> ---
>
> Key: SOLR-11939
> URL: https://issues.apache.org/jira/browse/SOLR-11939
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.3.1
>Reporter: Alexander S.
>Priority: Major
>
> Trying to create a collection this way:
> {code:java}
> /solr/admin/collections?wt=json=CREATE=carmen-test=1=4=shard1,shard2,shard3,shard4=carmen=compositeId=carmen_test{code}
> This appears in the log:
> {code:java}
> OverseerCollectionProcessor.processMessage : create , {
>   "name":"carmen-test",
>   "fromApi":"true",
>   "replicationFactor":"1",
>   "collection.configName":"carmen",
>   "numShards":"4",
>   "shards":"shard1,shard2,shard3,shard4",
>   "stateFormat":"2",
>   "property.name":"carmen_test",
>   "router.name":"compositeId",
>   "operation":"create"}{code}
> But the resulting core name is *carmen-test_shard1_replica1* matching 
> "collection name" + sharn name + replica number.
> How can I set a custom core name when creating a collection?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11939) Collection API: property.name ignored when creating collections

2018-02-02 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350473#comment-16350473
 ] 

Alexander S. commented on SOLR-11939:
-

Found this discussion 
[http://lucene.472066.n3.nabble.com/Core-property-name-ignored-when-creating-collection-using-API-td4183405.html]

It seems like I don't have to worry about the core name as it seems Solr is 
mowing towards collections.

> Collection API: property.name ignored when creating collections
> ---
>
> Key: SOLR-11939
> URL: https://issues.apache.org/jira/browse/SOLR-11939
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.3.1
>Reporter: Alexander S.
>Priority: Major
>
> Trying to create a collection this way:
> {code:java}
> /solr/admin/collections?wt=json=CREATE=carmen-test=1=4=shard1,shard2,shard3,shard4=carmen=compositeId=carmen_test{code}
> This appears in the log:
> {code:java}
> OverseerCollectionProcessor.processMessage : create , {
>   "name":"carmen-test",
>   "fromApi":"true",
>   "replicationFactor":"1",
>   "collection.configName":"carmen",
>   "numShards":"4",
>   "shards":"shard1,shard2,shard3,shard4",
>   "stateFormat":"2",
>   "property.name":"carmen_test",
>   "router.name":"compositeId",
>   "operation":"create"}{code}
> But the resulting core name is *carmen-test_shard1_replica1* matching 
> "collection name" + sharn name + replica number.
> How can I set a custom core name when creating a collection?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-11939) Collection API: property.name ignored when creating collections

2018-02-02 Thread Alexander S. (JIRA)
Alexander S. created SOLR-11939:
---

 Summary: Collection API: property.name ignored when creating 
collections
 Key: SOLR-11939
 URL: https://issues.apache.org/jira/browse/SOLR-11939
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 5.3.1
Reporter: Alexander S.


Trying to create a collection this way:
{code:java}
/solr/admin/collections?wt=json=CREATE=carmen-test=1=4=shard1,shard2,shard3,shard4=carmen=compositeId=carmen_test{code}
This appears in the log:
{code:java}
OverseerCollectionProcessor.processMessage : create , {
  "name":"carmen-test",
  "fromApi":"true",
  "replicationFactor":"1",
  "collection.configName":"carmen",
  "numShards":"4",
  "shards":"shard1,shard2,shard3,shard4",
  "stateFormat":"2",
  "property.name":"carmen_test",
  "router.name":"compositeId",
  "operation":"create"}{code}
But the resulting core name is *carmen-test_shard1_replica1* matching 
"collection name" + sharn name + replica number.

How can I set a custom core name when creating a collection?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"

2016-10-03 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541697#comment-15541697
 ] 

Alexander S. commented on SOLR-6468:


We now can't upgrade to Solr 6 due to this.

> Regression: StopFilterFactory doesn't work properly without 
> enablePositionIncrements="false"
> 
>
> Key: SOLR-6468
> URL: https://issues.apache.org/jira/browse/SOLR-6468
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.8.1, 4.9
>Reporter: Alexander S.
>
> Setup:
> * Schema version is 1.5
> * Field config:
> {code}
>  autoGeneratePhraseQueries="true">
>   
> 
>  ignoreCase="true" />
> 
>   
> 
> {code}
> * Stop words:
> {code}
> http 
> https 
> ftp 
> www
> {code}
> So very simple. In the index I have:
> * twitter.com/testuser
> All these queries do match:
> * twitter.com/testuser
> * com/testuser
> * testuser
> But none of these does:
> * https://twitter.com/testuser
> * https://www.twitter.com/testuser
> * www.twitter.com/testuser
> Debug output shows:
> "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")"
> But we need:
> "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")"
> Complete debug outputs:
> * a valid search: 
> http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
> * an invalid search: 
> http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
> The complete discussion and explanation of the problem is here: 
> http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
> I didn't find a clear explanation how can we upgrade Solr, there's no any 
> replacement or a workarround to this, so this is not just a major change but 
> a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2015-09-09 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738200#comment-14738200
 ] 

Alexander S. commented on SOLR-3274:


Hi, just wanted to let you know that adding 2 new ZK servers (so I have 5 
running ZK instances) improved the situation a lot.

But I found one weird thing with the ZK:
{code}
java.net.UnknownHostException: zoo5.devops
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-09-10 01:13:21,235 - WARN  
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open 
channel to 2 at election address zoo2.devops:3888
java.net.UnknownHostException: zoo2.devops
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-09-10 01:13:21,235 - WARN  
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open 
channel to 1 at election address zoo1.devops:3888
java.net.UnknownHostException: zoo1.devops
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-09-10 01:13:21,236 - WARN  
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open 
channel to 4 at election address zoo4.devops:3888
java.net.UnknownHostException: zoo4.devops
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
{code}

Just opened 2 ssh sessions to that server and was monitoring the log with tail. 
While ZK posted these errors I was able to ping zoo1/2/4/5.devops servers and 
was able to connect to ZK there with telnet. So it seems something could go 
wrong with ZK itself. At this time I seen these "cannot talk to ZK" errors in 
Solr.

And eventually I've just restarted this broken ZK instance and everything is 
fine again. So I guess Solr tried to connect namely to this broken ZK instance 
(can't say for sure since it doesn't mention the instance it failed to connect 
to in its log).

> ZooKeeper related SolrCloud problems
> 
>
> Key: SOLR-3274
> URL: https://issues.apache.org/jira/browse/SOLR-3274
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Any
>Reporter: Per Steffensen
>
> Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
> Solr servers, running 28 slices of the same collection (collA) - all slices 
> have one replica (two shards all in all - leader + replica) - 56 cores all in 
> all (8 shards on each solr instance). But anyways...
> Besides the problem reported in SOLR-3273, the system seems to run fine under 
> high load for several hours, but eventually errors like the ones shown below 
> start to occur. I might be wrong, but they all seem to indicate some kind of 
> unstability in the 

[jira] [Comment Edited] (SOLR-3274) ZooKeeper related SolrCloud problems

2015-09-09 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738200#comment-14738200
 ] 

Alexander S. edited comment on SOLR-3274 at 9/10/15 5:43 AM:
-

Hi, just wanted to let you know that adding 2 new ZK servers (so I have 5 
running ZK instances) improved the situation a lot.

But I found one weird thing with the ZK:
{code}
java.net.UnknownHostException: zoo5.devops
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-09-10 01:13:21,235 - WARN  
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open 
channel to 2 at election address zoo2.devops:3888
java.net.UnknownHostException: zoo2.devops
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-09-10 01:13:21,235 - WARN  
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open 
channel to 1 at election address zoo1.devops:3888
java.net.UnknownHostException: zoo1.devops
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
2015-09-10 01:13:21,236 - WARN  
[QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open 
channel to 4 at election address zoo4.devops:3888
java.net.UnknownHostException: zoo4.devops
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840)
at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762)
{code}

Just opened 2 ssh sessions to that server and was monitoring the log with tail. 
While ZK posted these errors I was able to ping zoo1/2/4/5.devops servers and 
was able to connect to ZK there with telnet. So it seems something could go 
wrong with ZK itself. At this time I seen these "cannot talk to ZK" errors in 
Solr.

And eventually I've just restarted this broken ZK instance and everything is 
fine again. So I guess Solr tried to connect namely to this broken ZK instance 
(can't say for sure since it doesn't mention the instance it failed to connect 
to in its log).

UPD: but still often see these errors in ZK logs:
{code}
2015-09-10 01:31:28,804 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.128.202.22:35990
2015-09-10 01:31:28,847 - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of 
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, 
likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:744)
2015-09-10 01:31:28,847 - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket 
connection for client /10.128.202.22:35990 (no session established for client)

[jira] [Comment Edited] (SOLR-6875) No data integrity between replicas

2015-06-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582960#comment-14582960
 ] 

Alexander S. edited comment on SOLR-6875 at 6/12/15 5:24 AM:
-

Got another error today on 4 shards set up, each has 2 replicas (8 nodes in 
total).

On the shard 4/replica 1 I see the next error: [^replica1.png]
On the shard 4/replica 2 the next: [^replica2.png]

Here's the backtrace for the error on the first screenshot:
{code}
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

After all this replica 1 shows:
{quote}
numDocs: 28 215 608
{quote}

And the replica 2 shows:
{quote}
numDocs: 28 215 609
{quote}

Everything worked well for a few months until yesterday, when we started to 
reindex some data (like 1.7m records).

Our Solr set up is using large pages and there's enough resources. Here's how 
we run the instances:
{code}
exec chpst -u solr java -Xms6G -Xmx8G -XX:+UseConcMarkSweepGC 
-XX:+UseLargePages -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled 
-XX:+UseLargePages -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=75 
-DzkHost=zoo5.devops:2181,zoo4.devops:2181,zoo1.devops:2181,zoo2.devops:2181,zoo3.devops:2181
 -Dcollection.configName=Carmen -Dbootstrap_confdir=./solr/conf 
-Dbootstrap_conf=true -DnumShards=4 -jar start.jar etc/jetty.xml
{code}

The server has 16 CPU cores and SSD RAID 10, the load average is between 2 and 
3 usually. The charts also don't show anything suspicious in server load, it is 
very stable.

So seems like something went wrong during recovery after the network error. Not 
sure how to debug that deeper and what those warnings in the log mean, for 
example the last 2 messages on the first screenshot, from 
DistributedUpdateProcessor and CoreAdminHandler.


was (Author: aheaven):
Get another error today on 4 shards set up, each has 2 replicas (8 nodes in 
total).

On the shard 4/replica 1 I see the next error: [^replica1.png]
On the shard 4/replica 2 the next: [^replica2.png]

Here's the backtrace for the error on the first screenshot:
{code}
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at 

[jira] [Updated] (SOLR-6875) No data integrity between replicas

2015-06-11 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-6875:
---
Attachment: replica2.png
replica1.png

Get another error today on 4 shards set up, each has 2 replicas (8 nodes in 
total).

On the shard 4/replica 1 I see the next error: [^replica1.png]
On the shard 4/replica 2 the next: [^replica2.png]

Here's the backtrace for the error on the first screenshot:
{code}
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

After all this replica 1 shows:
{quote}
numDocs: 28 215 608
{quote}

And the replica 2 shows:
{quote}
numDocs: 28 215 609
{quote}

Everything worked well for a few months until yesterday, when we started to 
reindex some data (like 1.7m records).

Our Solr set up is using large pages and there's enough resources. Here's how 
we run the instances:
{code}
exec chpst -u solr java -Xms6G -Xmx8G -XX:+UseConcMarkSweepGC 
-XX:+UseLargePages -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled 
-XX:+UseLargePages -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=75 
-DzkHost=zoo5.devops:2181,zoo4.devops:2181,zoo1.devops:2181,zoo2.devops:2181,zoo3.devops:2181
 -Dcollection.configName=Carmen -Dbootstrap_confdir=./solr/conf 
-Dbootstrap_conf=true -DnumShards=4 -jar start.jar etc/jetty.xml
{code}

The server has 16 CPU cores and SSD RAID 10, the load average is between 2 and 
3 usually. The charts also don't show anything suspicious in server load, it is 
very stable.

So seems like something went wrong during recovery after the network error. Not 
sure how to debug that deeper and what those warnings in the log mean, for 
example the last 2 messages on the first screenshot, from 
DistributedUpdateProcessor and CoreAdminHandler.

 No data integrity between replicas
 --

 Key: SOLR-6875
 URL: https://issues.apache.org/jira/browse/SOLR-6875
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.2
 Environment: One replica is @ Linux solr1.devops.wegohealth.com 
 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 
 x86_64 x86_64 GNU/Linux
 Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic 
 #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 Solr is running with the next options:
 * -Xms12G
 * -Xmx16G
 * -XX:+UseConcMarkSweepGC
 * -XX:+UseLargePages
 * -XX:+CMSParallelRemarkEnabled
 * -XX:+ParallelRefProcEnabled
 * -XX:+UseLargePages
 * -XX:+AggressiveOpts
 * -XX:CMSInitiatingOccupancyFraction=75
Reporter: Alexander S.
 

[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2015-03-02 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-5332:
---
Affects Version/s: 5.1

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6, 5.1
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2015-03-02 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-5332:
---
Fix Version/s: 5.1

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.
 Fix For: 5.1


 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2015-03-02 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-5332:
---
Affects Version/s: (was: 5.1)

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.
 Fix For: 5.1


 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7022) ERROR UpdateHandler java.lang.InterruptedException

2015-01-23 Thread Alexander S. (JIRA)
Alexander S. created SOLR-7022:
--

 Summary: ERROR UpdateHandler java.lang.InterruptedException
 Key: SOLR-7022
 URL: https://issues.apache.org/jira/browse/SOLR-7022
 Project: Solr
  Issue Type: Bug
 Environment: Solr 4.10.2, Ubuntu x86_64
Reporter: Alexander S.


What I did:
* Updated configs in zookeeper with zkcli.sh -cmd upconfig.
* Opened solr admin interface in the web browser
* Followed to core admin and reloaded the cores one by one

Backtrace:
{code}
java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:654)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
{code}

I already did that before and didn't see such errors, but previous time I 
increased the caches too much so warming time for query results cache was 
around 30 seconds. This time cores reload took much longer and then this error 
appeared in the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6875) No data integrity between replicas

2015-01-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272877#comment-14272877
 ] 

Alexander S. edited comment on SOLR-6875 at 1/11/15 11:33 AM:
--

Now we have 4 shards, each with 2 replicas (8 total nodes) and the next picture:
{noformat}
Shard 1:
  Replica 1: *14 486 089*
  Replica 2: *14 496 445*

Shard 2
  Replica 1: 14 496 609
  Replica 2: 14 496 609

Shard 3
  Replica 1: 14 492 812
  Replica 2: 14 492 812

Shard 4
  Replica 1: 14 488 755
  Replica 2: 14 488 755
{noformat}

How could it be? We didn't see anything like that before upgrade from 4.8.1 to 
4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason?


was (Author: aheaven):
Now we have 4 shards, each with 2 replicas (8 total nodes) and the next picture:
{noformat}
Shard 1:
  Replica 1: 14 486 089
  Replica 2: 14 496 445

Shard 2
  Replica 1: 14 496 609
  Replica 2: 14 496 609

Shard 3
  Replica 1: 14 492 812
  Replica 2: 14 492 812

Shard 4
  Replica 1: 14 488 755
  Replica 2: 14 488 755
{noformat}

How could it be? We didn't see anything like that before upgrade from 4.8.1 to 
4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason?

 No data integrity between replicas
 --

 Key: SOLR-6875
 URL: https://issues.apache.org/jira/browse/SOLR-6875
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.2
 Environment: One replica is @ Linux solr1.devops.wegohealth.com 
 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 
 x86_64 x86_64 GNU/Linux
 Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic 
 #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 Solr is running with the next options:
 * -Xms12G
 * -Xmx16G
 * -XX:+UseConcMarkSweepGC
 * -XX:+UseLargePages
 * -XX:+CMSParallelRemarkEnabled
 * -XX:+ParallelRefProcEnabled
 * -XX:+UseLargePages
 * -XX:+AggressiveOpts
 * -XX:CMSInitiatingOccupancyFraction=75
Reporter: Alexander S.

 Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total.
 Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, 
 and another (Solr1.1) 45 574 038 docs.
 Solr1 is the leader, these errors appeared in the logs:
 {code}
 ERROR - 2014-12-20 09:54:38.783; 
 org.apache.solr.update.StreamingSolrServers$1; error
 java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
 at 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
 at 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
 at 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
 at 
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
 at 
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
 at 
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
 at 
 org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
 at 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 WARN  - 2014-12-20 09:54:38.787; 
 org.apache.solr.update.processor.DistributedUpdateProcessor; 

[jira] [Commented] (SOLR-6875) No data integrity between replicas

2015-01-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272877#comment-14272877
 ] 

Alexander S. commented on SOLR-6875:


Now we have 4 shards, each with 2 replics (8 total nodes) and the next picture:
{noformat}
Shard 1:
  Replica 1: 14 486 089
  Replica 2: 14 496 445

Shard 2
  Replica 1: 14 496 609
  Replica 2: 14 496 609

Shard 3
  Replica 1: 14 492 812
  Replica 2: 14 492 812

Shard 4
  Replica 1: 14 488 755
  Replica 2: 14 488 755
{noformat}

How could it be? We didn't see anything like that before upgrade from 4.8.1 to 
4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason?

 No data integrity between replicas
 --

 Key: SOLR-6875
 URL: https://issues.apache.org/jira/browse/SOLR-6875
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.2
 Environment: One replica is @ Linux solr1.devops.wegohealth.com 
 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 
 x86_64 x86_64 GNU/Linux
 Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic 
 #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 Solr is running with the next options:
 * -Xms12G
 * -Xmx16G
 * -XX:+UseConcMarkSweepGC
 * -XX:+UseLargePages
 * -XX:+CMSParallelRemarkEnabled
 * -XX:+ParallelRefProcEnabled
 * -XX:+UseLargePages
 * -XX:+AggressiveOpts
 * -XX:CMSInitiatingOccupancyFraction=75
Reporter: Alexander S.

 Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total.
 Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, 
 and another (Solr1.1) 45 574 038 docs.
 Solr1 is the leader, these errors appeared in the logs:
 {code}
 ERROR - 2014-12-20 09:54:38.783; 
 org.apache.solr.update.StreamingSolrServers$1; error
 java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
 at 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
 at 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
 at 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
 at 
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
 at 
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
 at 
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
 at 
 org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
 at 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 WARN  - 2014-12-20 09:54:38.787; 
 org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending 
 update
 java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
 at 
 

[jira] [Comment Edited] (SOLR-6875) No data integrity between replicas

2015-01-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272877#comment-14272877
 ] 

Alexander S. edited comment on SOLR-6875 at 1/11/15 11:33 AM:
--

Now we have 4 shards, each with 2 replicas (8 total nodes) and the next picture:
{noformat}
Shard 1:
  Replica 1: 14 486 089
  Replica 2: 14 496 445

Shard 2
  Replica 1: 14 496 609
  Replica 2: 14 496 609

Shard 3
  Replica 1: 14 492 812
  Replica 2: 14 492 812

Shard 4
  Replica 1: 14 488 755
  Replica 2: 14 488 755
{noformat}

How could it be? We didn't see anything like that before upgrade from 4.8.1 to 
4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason?


was (Author: aheaven):
Now we have 4 shards, each with 2 replics (8 total nodes) and the next picture:
{noformat}
Shard 1:
  Replica 1: 14 486 089
  Replica 2: 14 496 445

Shard 2
  Replica 1: 14 496 609
  Replica 2: 14 496 609

Shard 3
  Replica 1: 14 492 812
  Replica 2: 14 492 812

Shard 4
  Replica 1: 14 488 755
  Replica 2: 14 488 755
{noformat}

How could it be? We didn't see anything like that before upgrade from 4.8.1 to 
4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason?

 No data integrity between replicas
 --

 Key: SOLR-6875
 URL: https://issues.apache.org/jira/browse/SOLR-6875
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.2
 Environment: One replica is @ Linux solr1.devops.wegohealth.com 
 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 
 x86_64 x86_64 GNU/Linux
 Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic 
 #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 Solr is running with the next options:
 * -Xms12G
 * -Xmx16G
 * -XX:+UseConcMarkSweepGC
 * -XX:+UseLargePages
 * -XX:+CMSParallelRemarkEnabled
 * -XX:+ParallelRefProcEnabled
 * -XX:+UseLargePages
 * -XX:+AggressiveOpts
 * -XX:CMSInitiatingOccupancyFraction=75
Reporter: Alexander S.

 Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total.
 Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, 
 and another (Solr1.1) 45 574 038 docs.
 Solr1 is the leader, these errors appeared in the logs:
 {code}
 ERROR - 2014-12-20 09:54:38.783; 
 org.apache.solr.update.StreamingSolrServers$1; error
 java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
 at 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
 at 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
 at 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
 at 
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
 at 
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
 at 
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
 at 
 org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
 at 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 WARN  - 2014-12-20 09:54:38.787; 
 org.apache.solr.update.processor.DistributedUpdateProcessor; Error 

[jira] [Comment Edited] (SOLR-6875) No data integrity between replicas

2015-01-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272877#comment-14272877
 ] 

Alexander S. edited comment on SOLR-6875 at 1/11/15 11:33 AM:
--

Now we have 4 shards, each with 2 replicas (8 total nodes) and the next picture:
{noformat}
Shard 1:
  Replica 1: 14 486 089
  Replica 2: 14 496 445

Shard 2
  Replica 1: 14 496 609
  Replica 2: 14 496 609

Shard 3
  Replica 1: 14 492 812
  Replica 2: 14 492 812

Shard 4
  Replica 1: 14 488 755
  Replica 2: 14 488 755
{noformat}

How could it be? We didn't see anything like that before upgrade from 4.8.1 to 
4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason?


was (Author: aheaven):
Now we have 4 shards, each with 2 replicas (8 total nodes) and the next picture:
{noformat}
Shard 1:
  Replica 1: *14 486 089*
  Replica 2: *14 496 445*

Shard 2
  Replica 1: 14 496 609
  Replica 2: 14 496 609

Shard 3
  Replica 1: 14 492 812
  Replica 2: 14 492 812

Shard 4
  Replica 1: 14 488 755
  Replica 2: 14 488 755
{noformat}

How could it be? We didn't see anything like that before upgrade from 4.8.1 to 
4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason?

 No data integrity between replicas
 --

 Key: SOLR-6875
 URL: https://issues.apache.org/jira/browse/SOLR-6875
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.2
 Environment: One replica is @ Linux solr1.devops.wegohealth.com 
 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 
 x86_64 x86_64 GNU/Linux
 Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic 
 #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 Solr is running with the next options:
 * -Xms12G
 * -Xmx16G
 * -XX:+UseConcMarkSweepGC
 * -XX:+UseLargePages
 * -XX:+CMSParallelRemarkEnabled
 * -XX:+ParallelRefProcEnabled
 * -XX:+UseLargePages
 * -XX:+AggressiveOpts
 * -XX:CMSInitiatingOccupancyFraction=75
Reporter: Alexander S.

 Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total.
 Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, 
 and another (Solr1.1) 45 574 038 docs.
 Solr1 is the leader, these errors appeared in the logs:
 {code}
 ERROR - 2014-12-20 09:54:38.783; 
 org.apache.solr.update.StreamingSolrServers$1; error
 java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
 at 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
 at 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
 at 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
 at 
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
 at 
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
 at 
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
 at 
 org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
 at 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 WARN  - 2014-12-20 09:54:38.787; 
 org.apache.solr.update.processor.DistributedUpdateProcessor; 

[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order

2015-01-02 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262969#comment-14262969
 ] 

Alexander S. commented on SOLR-6494:


Correct, and that's exactly my case, because the time is entered by users and 
differ between queries. I'd love to have something like this working with the 
standard query parser:
{code}
fq={!cache=false cost=101}field:value
{code}
It seems that `cache=false` does actually work, but `cost` doesn't (some 
parsers, like the frange one, do threat and apply all queries with the `cost` 
higher than 100 as post filters).

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order

2014-12-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259627#comment-14259627
 ] 

Alexander S. commented on SOLR-6494:


As I was told already, Solr does not apply filters incrementally, instead each 
filter runs through the entire data set, then Solr caches the results. In the 
case with filters that contain ranges cache is not effective, especially when 
we need NRT search and commits being triggered multiple times per minute. Then 
big caches make no sense and big autowarming numbers causing Solr to fail. My 
point is that cache is not always efficient and for such cases Solr need to use 
another strategy and apply filters incrementally (read as post filters).

So this:
{quote}
By design, fq clauses like this are calculated for the entire document set and 
the results cached, there is no ordering for that part. Otherwise, how could 
they be re-used for a different query?
{quote}
does not work in all cases.

Something like this:
{code}
fq={!cache=false cost=101}field:value # to run as a post filter
{code}
would definitely solve the problem, but this is not supported.

The frange parser has support for this, but it is not always suitable and fails 
with different errors, like can not use FieldCache on multivalued field: 
type, etc.

Does that look like a missing feature? I mean for me it definitely does, but 
could this be considered as a wish and implemented some day? How can Solr 
community help with missing features?

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order

2014-12-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259627#comment-14259627
 ] 

Alexander S. edited comment on SOLR-6494 at 12/28/14 12:50 PM:
---

As I was told already, Solr does not apply filters incrementally, instead each 
filter runs through the entire data set, then Solr caches the results. In the 
case with filters that contain ranges cache is not effective, especially when 
we need NRT search and commits being triggered multiple times per minute. Then 
big caches make no sense and big autowarming numbers causing Solr to fail. My 
point is that cache is not always efficient and for such cases Solr need to use 
another strategy and apply filters incrementally (read as post filters).

So this:
{quote}
By design, fq clauses like this are calculated for the entire document set and 
the results cached, there is no ordering for that part. Otherwise, how could 
they be re-used for a different query?
{quote}
does not work in all cases.

Something like this:
{code}
# cost  100 to run as a post filter, but something like post=true would be 
better I think
fq={!cache=false cost=101}field:value
{code}
would definitely solve the problem, but this is not supported.

The frange parser has support for this, but it is not always suitable and fails 
with different errors, like can not use FieldCache on multivalued field: 
type, etc.

Does that look like a missing feature? I mean for me it definitely does, but 
could this be considered as a wish and implemented some day? How can Solr 
community help with missing features?


was (Author: aheaven):
As I was told already, Solr does not apply filters incrementally, instead each 
filter runs through the entire data set, then Solr caches the results. In the 
case with filters that contain ranges cache is not effective, especially when 
we need NRT search and commits being triggered multiple times per minute. Then 
big caches make no sense and big autowarming numbers causing Solr to fail. My 
point is that cache is not always efficient and for such cases Solr need to use 
another strategy and apply filters incrementally (read as post filters).

So this:
{quote}
By design, fq clauses like this are calculated for the entire document set and 
the results cached, there is no ordering for that part. Otherwise, how could 
they be re-used for a different query?
{quote}
does not work in all cases.

Something like this:
{code}
fq={!cache=false cost=101}field:value # to run as a post filter
{code}
would definitely solve the problem, but this is not supported.

The frange parser has support for this, but it is not always suitable and fails 
with different errors, like can not use FieldCache on multivalued field: 
type, etc.

Does that look like a missing feature? I mean for me it definitely does, but 
could this be considered as a wish and implemented some day? How can Solr 
community help with missing features?

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order

2014-12-26 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259150#comment-14259150
 ] 

Alexander S. commented on SOLR-6494:


Just an idea, but what if Solr detecting that the filter does use date rages 
like [* TO 2014-09-08T23:59:59Z] (or probably any ranges were cache is not very 
efficient), and if there are other simpler filters in the query, will apply 
such range filters at last? And probably to already fetched results as a post 
filter? And probably avoid caching for this filter? That sounds like a good 
optimization to me. This will avoid losing of more useful filters from the 
cache, increase warming speed and which is the most important — increase the 
search speed. [~erickerickson] [~hossman]

Best,
Alex

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order

2014-12-26 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259150#comment-14259150
 ] 

Alexander S. edited comment on SOLR-6494 at 12/26/14 5:20 PM:
--

Just an idea, but what if Solr detecting that the filter does use date rages 
like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not 
very efficient), and if there are other simpler filters in the query, will 
apply such range filters at last? And probably to already fetched results as a 
post filter? And probably avoid caching for this filter? That sounds like a 
good optimization to me. This will avoid losing of more useful filters from the 
cache, increase warming speed and which is the most important — increase the 
search speed. [~erickerickson] [~hossman]

Best,
Alex


was (Author: aheaven):
Just an idea, but what if Solr detecting that the filter does use date rages 
like [* TO 2014-09-08T23:59:59Z] (or probably any ranges were cache is not very 
efficient), and if there are other simpler filters in the query, will apply 
such range filters at last? And probably to already fetched results as a post 
filter? And probably avoid caching for this filter? That sounds like a good 
optimization to me. This will avoid losing of more useful filters from the 
cache, increase warming speed and which is the most important — increase the 
search speed. [~erickerickson] [~hossman]

Best,
Alex

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order

2014-12-26 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259150#comment-14259150
 ] 

Alexander S. edited comment on SOLR-6494 at 12/26/14 5:24 PM:
--

Just an idea, but what if Solr detecting that the filter does use date rages 
like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not 
very efficient), and if there are other simpler filters in the query, will 
apply such range filters at last? And probably to already fetched results as a 
post filter? And probably avoid caching for this filter? That sounds like a 
good optimization to me. This will avoid losing of more useful filters from the 
cache, increase warming speed and which is the most important — increase the 
search speed.

Like in the case above, if you have 200m of docs, but only 12k with 
type:AwardNomination, and query has 2 filters, one with a date range, Solr 
definitely can detect this and do the right thing instead simply loop through 
all 200m documents with this cache-inefficient filter. Could this be at least 
considered as a wish?

[~erickerickson] [~hossman]

Best,
Alex


was (Author: aheaven):
Just an idea, but what if Solr detecting that the filter does use date rages 
like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not 
very efficient), and if there are other simpler filters in the query, will 
apply such range filters at last? And probably to already fetched results as a 
post filter? And probably avoid caching for this filter? That sounds like a 
good optimization to me. This will avoid losing of more useful filters from the 
cache, increase warming speed and which is the most important — increase the 
search speed. [~erickerickson] [~hossman]

Best,
Alex

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order

2014-12-26 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259150#comment-14259150
 ] 

Alexander S. edited comment on SOLR-6494 at 12/26/14 5:25 PM:
--

Just an idea, but what if Solr detecting that the filter does use date rages 
like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not 
very efficient), and if there are other simpler filters in the query, will 
apply such range filters at last? And probably to already fetched results as a 
post filter? And probably avoid caching for this filter? That sounds like a 
good optimization to me. This will avoid losing of more useful filters from the 
cache, increase warming speed and which is the most important — increase the 
search speed.

Like in the case above, if you have 200m of docs, but only 12k with 
type:AwardNomination, and query has 2 filters, one with a date range, Solr 
definitely can detect this and do the right thing instead simply loop through 
all 200m documents with this cache-inefficient filter. Could this be at least 
considered as a wish and reopened?

[~erickerickson] [~hossman]

Best,
Alex


was (Author: aheaven):
Just an idea, but what if Solr detecting that the filter does use date rages 
like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not 
very efficient), and if there are other simpler filters in the query, will 
apply such range filters at last? And probably to already fetched results as a 
post filter? And probably avoid caching for this filter? That sounds like a 
good optimization to me. This will avoid losing of more useful filters from the 
cache, increase warming speed and which is the most important — increase the 
search speed.

Like in the case above, if you have 200m of docs, but only 12k with 
type:AwardNomination, and query has 2 filters, one with a date range, Solr 
definitely can detect this and do the right thing instead simply loop through 
all 200m documents with this cache-inefficient filter. Could this be at least 
considered as a wish?

[~erickerickson] [~hossman]

Best,
Alex

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order

2014-12-26 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259150#comment-14259150
 ] 

Alexander S. edited comment on SOLR-6494 at 12/26/14 5:27 PM:
--

Just an idea, but what if Solr detecting that the filter does use date rages 
like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not 
very efficient), and if there are other simpler filters in the query, will 
apply such range filters at last? And probably to already fetched results as a 
post filter? And probably avoid caching for this filter? That sounds like a 
good optimization to me. This will avoid losing of more useful filters from the 
cache, increase warming speed and which is the most important — increase the 
search speed.

Like in the case above, if you have 200m of docs, but only 12k with 
type:AwardNomination, and query has 2 filters where one with a date range. Solr 
definitely can detect this and do the right thing instead of simply looping 
through all 200m documents with this cache-inefficient filter. Could this be at 
least considered as a wish and reopened?

[~erickerickson] [~hossman]

Best,
Alex


was (Author: aheaven):
Just an idea, but what if Solr detecting that the filter does use date rages 
like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not 
very efficient), and if there are other simpler filters in the query, will 
apply such range filters at last? And probably to already fetched results as a 
post filter? And probably avoid caching for this filter? That sounds like a 
good optimization to me. This will avoid losing of more useful filters from the 
cache, increase warming speed and which is the most important — increase the 
search speed.

Like in the case above, if you have 200m of docs, but only 12k with 
type:AwardNomination, and query has 2 filters, one with a date range, Solr 
definitely can detect this and do the right thing instead simply loop through 
all 200m documents with this cache-inefficient filter. Could this be at least 
considered as a wish and reopened?

[~erickerickson] [~hossman]

Best,
Alex

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6875) No data integrity between replicas

2014-12-21 Thread Alexander S. (JIRA)
Alexander S. created SOLR-6875:
--

 Summary: No data integrity between replicas
 Key: SOLR-6875
 URL: https://issues.apache.org/jira/browse/SOLR-6875
 Project: Solr
  Issue Type: Bug
Reporter: Alexander S.


Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total.

Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, and 
another (Solr1.1) 45 574 038 docs.

Solr1 is the leader, these errors appeared in the logs:
{code}
ERROR - 2014-12-20 09:54:38.783; org.apache.solr.update.StreamingSolrServers$1; 
error
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
WARN  - 2014-12-20 09:54:38.787; 
org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending 
update
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:196)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at 
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at 
org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)

[jira] [Updated] (SOLR-6875) No data integrity between replicas

2014-12-21 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-6875:
---
  Environment: 
One replica is @ Linux solr1.devops.wegohealth.com 3.8.0-29-generic 
#42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 
GNU/Linux
Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic 
#30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Solr is running with the next options:
* -Xms12G
* -Xmx16G
* -XX:+UseConcMarkSweepGC
* -XX:+UseLargePages
* -XX:+CMSParallelRemarkEnabled
* -XX:+ParallelRefProcEnabled
* -XX:+UseLargePages
* -XX:+AggressiveOpts
* -XX:CMSInitiatingOccupancyFraction=75
Affects Version/s: 4.10.2

 No data integrity between replicas
 --

 Key: SOLR-6875
 URL: https://issues.apache.org/jira/browse/SOLR-6875
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.2
 Environment: One replica is @ Linux solr1.devops.wegohealth.com 
 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 
 x86_64 x86_64 GNU/Linux
 Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic 
 #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
 Solr is running with the next options:
 * -Xms12G
 * -Xmx16G
 * -XX:+UseConcMarkSweepGC
 * -XX:+UseLargePages
 * -XX:+CMSParallelRemarkEnabled
 * -XX:+ParallelRefProcEnabled
 * -XX:+UseLargePages
 * -XX:+AggressiveOpts
 * -XX:CMSInitiatingOccupancyFraction=75
Reporter: Alexander S.

 Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total.
 Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, 
 and another (Solr1.1) 45 574 038 docs.
 Solr1 is the leader, these errors appeared in the logs:
 {code}
 ERROR - 2014-12-20 09:54:38.783; 
 org.apache.solr.update.StreamingSolrServers$1; error
 java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
 at 
 org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
 at 
 org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
 at 
 org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
 at 
 org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
 at 
 org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197)
 at 
 org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
 at 
 org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682)
 at 
 org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486)
 at 
 org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
 at 
 org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
 at 
 org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 WARN  - 2014-12-20 09:54:38.787; 
 org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending 
 update
 java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:196)
 at java.net.SocketInputStream.read(SocketInputStream.java:122)
 at 
 org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
 at 
 org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
 at 
 

[jira] [Commented] (SOLR-6769) Election bug

2014-12-21 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255146#comment-14255146
 ] 

Alexander S. commented on SOLR-6769:


Correct, an endless warming was causing this problem. So this is a bug in Solr, 
it waits for searchers to end warming, which could take up to 5 minutes in some 
cases. The node itself goes down and does not accept connections but the 
ellection does not happen.

 Election bug
 

 Key: SOLR-6769
 URL: https://issues.apache.org/jira/browse/SOLR-6769
 Project: Solr
  Issue Type: Bug
Reporter: Alexander S.
 Attachments: Screenshot 876.png


 Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in 
 total).
 What I did is just stopped the shards, but if first shard stopped immediately 
 the second one took about 5 minutes to stop. You can see on the screenshot 
 what happened next. In short:
 1. Shard 1 stopped normally
 3. Replica 1 became a leader
 2. Shard 2 still was performing some job but wasn't accepting connection
 4. Replica 2 did not became a leader because Shard 2 is still there but 
 doesn't work
 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a 
 leader
 Marked as critical because this shuts down the entire cluster. Please adjust 
 if I am wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6769) Election bug

2014-12-18 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252440#comment-14252440
 ] 

Alexander S. commented on SOLR-6769:


This might be related: 
http://lucene.472066.n3.nabble.com/Endless-100-CPU-usage-on-searcherExecutor-thread-td4175088.html

 Election bug
 

 Key: SOLR-6769
 URL: https://issues.apache.org/jira/browse/SOLR-6769
 Project: Solr
  Issue Type: Bug
Reporter: Alexander S.
 Attachments: Screenshot 876.png


 Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in 
 total).
 What I did is just stopped the shards, but if first shard stopped immediately 
 the second one took about 5 minutes to stop. You can see on the screenshot 
 what happened next. In short:
 1. Shard 1 stopped normally
 3. Replica 1 became a leader
 2. Shard 2 still was performing some job but wasn't accepting connection
 4. Replica 2 did not became a leader because Shard 2 is still there but 
 doesn't work
 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a 
 leader
 Marked as critical because this shuts down the entire cluster. Please adjust 
 if I am wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6769) Election bug

2014-12-09 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239203#comment-14239203
 ] 

Alexander S. commented on SOLR-6769:


Hi, yes, my terminology about shards and replicas wasn't clear, let me explain 
this better.

* Solr: 4.8.1
* Java:
java version 1.7.0_51
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
* We have 5 servers, 2 of which are big (16 CPU cores, 48G of RAM each) and 3 
others are small (1 CPU and 1G of RAM). All servers have rapid SSD RAID 10. 
Each server runs a ZK instance, so we have 5 ZK instances in total. Those big 
servers also run Solr: the first one runs 2 instances and the second one also 
runs 2 replicas (so each shard has 2 replicas, the simplest SolrCloud setup 
from the wiki).

So the cluster looks like this:
{noformat}
* Small 1G node: ZK
* Small 1G node: ZK
* Small 1G node: ZK
* Big 16G node: ZK, Solr1, Solr2
* Big 16G node: ZK, Solr1.1, Solr2.1
{noformat}

Stopped manually means I tried to manually stop Solr1 and Solr2, which were 
the leaders, by sending a TERM signal (we have service files so I did service 
stop and was expecting a graceful shut down). This was working for Solr1 and 
it went down normally and Solr1.1 became the leader instantly. Then I tried to 
do the same for Solr2, but once I sent the TERM it became not operable but 
didn't exit completely (orange on the screenshot), the process was still 
running for ≈ 5-10 minutes and the election didn't happen. As a result I get 
no node hosting shard errors, but was expecting Solr2.1 to become the leader 
instantly as it was with Solr1.1.

As I understand this, the Solr2 didn't shut down instantly because there could 
be some background jobs, e.g. index merging, an in process commit, etc, *but 
then it should not stop accepting connections and should not change its status 
to down* until all background jobs are finished and it s really ready to go 
down and pass leadership to the Solr2.1.

It seems like a bug in Solr, because all services were working normally, all ZK 
instances were up and operable, and Solr itself wasn't under a heavy load. 
Otherwise could you please point me where to look for any information about how 
to gracefully shut down instances? It would be good to have a button in the web 
UI to be able to force a replica to become the leader with one click. So then I 
would be able to force Solr1.1 and Solr 2.1 to become the leaders, wait until 
this happen and safely reboot Solr1 and solr2 instances.

Best,
Alexander

 Election bug
 

 Key: SOLR-6769
 URL: https://issues.apache.org/jira/browse/SOLR-6769
 Project: Solr
  Issue Type: Bug
Reporter: Alexander S.
 Attachments: Screenshot 876.png


 Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in 
 total).
 What I did is just stopped the shards, but if first shard stopped immediately 
 the second one took about 5 minutes to stop. You can see on the screenshot 
 what happened next. In short:
 1. Shard 1 stopped normally
 3. Replica 1 became a leader
 2. Shard 2 still was performing some job but wasn't accepting connection
 4. Replica 2 did not became a leader because Shard 2 is still there but 
 doesn't work
 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a 
 leader
 Marked as critical because this shuts down the entire cluster. Please adjust 
 if I am wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6769) Election bug

2014-11-20 Thread Alexander S. (JIRA)
Alexander S. created SOLR-6769:
--

 Summary: Election bug
 Key: SOLR-6769
 URL: https://issues.apache.org/jira/browse/SOLR-6769
 Project: Solr
  Issue Type: Bug
Reporter: Alexander S.
Priority: Critical


Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in total).

What I did is just stopped the shards, but if first shard stopped immediately 
the second one took about 5 minutes to stop. You can see on the screenshot what 
happened next. In short:
1. Shard 1 stopped normally
3. Replica 1 became a leader
2. Shard 2 still was performing some job but wasn't accepting connection
4. Replica 2 did not became a leader because Shard 2 is still there but doesn't 
work
5. Entire cluster went down until Shard 2 stopped and Replica 2 became a leader

Marked as critical because this shuts down the entire cluster. Please adjust if 
I am wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6769) Election bug

2014-11-20 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-6769:
---
Attachment: Screenshot 876.png

[^Screenshot 876.png]

 Election bug
 

 Key: SOLR-6769
 URL: https://issues.apache.org/jira/browse/SOLR-6769
 Project: Solr
  Issue Type: Bug
Reporter: Alexander S.
Priority: Critical
 Attachments: Screenshot 876.png


 Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in 
 total).
 What I did is just stopped the shards, but if first shard stopped immediately 
 the second one took about 5 minutes to stop. You can see on the screenshot 
 what happened next. In short:
 1. Shard 1 stopped normally
 3. Replica 1 became a leader
 2. Shard 2 still was performing some job but wasn't accepting connection
 4. Replica 2 did not became a leader because Shard 2 is still there but 
 doesn't work
 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a 
 leader
 Marked as critical because this shuts down the entire cluster. Please adjust 
 if I am wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order

2014-09-12 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131495#comment-14131495
 ] 

Alexander S. commented on SOLR-6494:


So I've added a new field nominated_at_d to all docs with type 
Award::Nomination. Now this query:
{code}
{
  fq: [
type:Award::Nomination,
nominated_at_d:[* TO 2014-09-08T23:59:59Z]
  ],
  sort: score desc,
  start: 0,
  rows: 20,
  q: *:*
}
{code}
doesn't take longer than a few milliseconds.

The new nominated_at_d is the same field as created_at_d, the only difference 
is that there are only ≈ 12k of documents with nominated_at_d field and ≈ 100m 
with created_at_d.

So again, I am saying that current way Solr applies filters is not optimal, 
sometimes we need to skip cache and apply filters incrementally. So each filter 
doesn't have to go through entire collection, so we can filter this way:
{code}
200m docs → filter (type:Award::Nomination) → 12k docs → filter 
(created_at_d:[* TO 2014-09-08T23:59:59Z]) → 500 docs
{code}

I don't think the *entire* solr user community can do anything with this, but a 
few solr developers could. Do I have to be an solr expert to report 
bugs/feature leaks?

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order

2014-09-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129956#comment-14129956
 ] 

Alexander S. commented on SOLR-6494:


Added the schema and debug output here: 
http://lucene.472066.n3.nabble.com/Help-with-a-slow-filter-query-td4158159.html

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130090#comment-14130090
 ] 

Alexander S. commented on SOLR-6468:


Just tried to add matchVersion but got this error:
{code}
null:org.apache.solr.common.SolrException: Unable to create core: crm-prod
at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:911)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:568)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: Could not load core 
configuration for core crm-prod
at 
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:66)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:554)
... 8 more
Caused by: org.apache.solr.common.SolrException: Plugin init failure for 
[schema.xml] fieldType words_ngram: Plugin init failure for [schema.xml] 
analyzer/filter: Error instantiating class: 
'org.apache.lucene.analysis.core.StopFilterFactory'. Schema file is 
/etc/solr/core2/schema.xml
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:616)
at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)
at 
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at 
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at 
org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:89)
at 
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)
... 9 more
Caused by: org.apache.solr.common.SolrException: Plugin init failure for 
[schema.xml] fieldType words_ngram: Plugin init failure for [schema.xml] 
analyzer/filter: Error instantiating class: 
'org.apache.lucene.analysis.core.StopFilterFactory'
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:470)
... 14 more
Caused by: org.apache.solr.common.SolrException: Plugin init failure for 
[schema.xml] analyzer/filter: Error instantiating class: 
'org.apache.lucene.analysis.core.StopFilterFactory'
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)
at 
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:400)
at 
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:86)
at 
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 15 more
Caused by: org.apache.solr.common.SolrException: Error instantiating class: 
'org.apache.lucene.analysis.core.StopFilterFactory'
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:606)
at 
org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:382)
at 
org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:376)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)
... 19 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:603)
... 22 more
Caused by: java.lang.IllegalArgumentException: Unknown parameters: 
{matchVersion=4.3}
at 
org.apache.lucene.analysis.core.StopFilterFactory.init(StopFilterFactory.java:91)
... 27 more
{code}

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
 

[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130209#comment-14130209
 ] 

Alexander S. commented on SOLR-6468:


Thanks, it does work with luceneMatchVersion=4.3, isn't this deprecated? Any 
chance to return enablePositionIncrements?

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order

2014-09-10 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128322#comment-14128322
 ] 

Alexander S. commented on SOLR-6494:


Unfortunately that doesn't solve the problem completely, these queries take ≈7 
seconds instead of 15:
{code}
{!cache=false}type:Award::Nomination
{!cache=false cost=10}created_at_d:[* TO 2014-09-08T23:59:59Z]
{code}
Which is still not good since I have only 11 974 docs with 
type:Award::Nomination and 139 716 883 with created_at_d:[* TO 
2014-09-08T23:59:59Z]. if the cost parameter tells Solr to apply cheapest 
filters first why the query still takes so long? It seems even though it 
doesn't run them in parallel filters still don't know of each other and go 
through all docs. My point is that it would be much faster if it could run 
filters one by one and if each next filter would work not with the entire data 
set but with results returned from the previous filter.

Also tried cost = 100 to apply a filter as a post filter, but nothing changes, 
same 7 seconds. Filter cache doesn't help here.

So this:
 By design, fq clauses like this are calculated for the entire document set 
 and the results cached, there is no ordering for that part.
doesn't sound right to me. Sometimes we don't need to reuse filters (and 
sometimes even can't, e.g. the cost option requires cache=false).

In the provided use case the way Solr applies filters is more harmful than 
useful. I'd even say more than 600 times harmful. The query that wouldn't take 
more than a second in MySQL takes 15 seconds in a search engine that uses rapid 
SSD RAID 10, has a few shards and replicas, uses more that 160G of memory in 
total and has ≈40 CPU cores.

Thus this sounds like a feature leak (at least). Please share your thoughts on 
this.

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6494) Query filters applied in a wrong order

2014-09-09 Thread Alexander S. (JIRA)
Alexander S. created SOLR-6494:
--

 Summary: Query filters applied in a wrong order
 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.


This query:
{code}
{
  fq: [type:Award::Nomination],
  sort: score desc,
  start: 0,
  rows: 20,
  q: *:*
}
{code}
takes just a few milliseconds, but this one:
{code}
{
  fq: [
type:Award::Nomination,
created_at_d:[* TO 2014-09-08T23:59:59Z]
  ],
  sort: score desc,
  start: 0,
  rows: 20,
  q: *:*
}
{code}
takes almost 15 seconds.

I have just ≈12k of documents with type Award::Nomination, but around half a 
billion with created_at_d field set. And it seems Solr applies the created_at_d 
filter first going through all documents where this field is set, which is not 
very smart.

I think if it can't do anything better than applying filters in the alphabet 
order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order

2014-09-09 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127304#comment-14127304
 ] 

Alexander S. commented on SOLR-6494:


Hi, thank you for the explanation, but I think sometimes (like in this case) it 
would be much more efficient to run filters one by one. It seems that the cost 
parameter should do what I need, e.g.:
{code}
{!cost=1}type:Award::Nomination
{!cost=10}created_at_d:[* TO 2014-09-08T23:59:59Z]
{code}

 Query filters applied in a wrong order
 --

 Key: SOLR-6494
 URL: https://issues.apache.org/jira/browse/SOLR-6494
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1
Reporter: Alexander S.

 This query:
 {code}
 {
   fq: [type:Award::Nomination],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes just a few milliseconds, but this one:
 {code}
 {
   fq: [
 type:Award::Nomination,
 created_at_d:[* TO 2014-09-08T23:59:59Z]
   ],
   sort: score desc,
   start: 0,
   rows: 20,
   q: *:*
 }
 {code}
 takes almost 15 seconds.
 I have just ≈12k of documents with type Award::Nomination, but around half 
 a billion with created_at_d field set. And it seems Solr applies the 
 created_at_d filter first going through all documents where this field is 
 set, which is not very smart.
 I think if it can't do anything better than applying filters in the alphabet 
 order it should apply them in the order they were received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-02 Thread Alexander S. (JIRA)
Alexander S. created SOLR-6468:
--

 Summary: Regression: StopFilterFactory doesn't work properly 
without enablePositionIncrements=false
 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.9, 4.8.1
Reporter: Alexander S.


Setup:
* Schema version is 1.5
* Field config:
{code}
fieldType name=words_ngram class=solr.TextField omitNorms=false 
autoGeneratePhraseQueries=true
  analyzer
tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
filter class=solr.StopFilterFactory words=url_stopwords.txt 
ignoreCase=true /
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType
{code}
* Stop words:
{code}
http 
https 
ftp 
www
{code}

So very simple. In the index I have:
* twitter.com/testuser

All these queries do match:
* twitter.com/testuser
* com/testuser
* testuser

But none of these does:
* https://twitter.com/testuser
* https://www.twitter.com/testuser
* www.twitter.com/testuser

Debug output shows:
parsedquery_toString: +(url_words_ngram:\? twitter com zer0sleep\)
But we need:
parsedquery_toString: +(url_words_ngram:\twitter com zer0sleep\)

Complete debug outputs:
* a valid search: 
http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
* an invalid search: 
http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww

The complete discussion and explanation of the problem is here: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html

I didn't find a clear explanation how can we upgrade Solr, there's no any 
replacement or a workarround to this, so this is not just a major change but a 
major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-02 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-6468:
---
Description: 
Setup:
* Schema version is 1.5
* Field config:
{code}
fieldType name=words_ngram class=solr.TextField omitNorms=false 
autoGeneratePhraseQueries=true
  analyzer
tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
filter class=solr.StopFilterFactory words=url_stopwords.txt 
ignoreCase=true /
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType
{code}
* Stop words:
{code}
http 
https 
ftp 
www
{code}

So very simple. In the index I have:
* twitter.com/testuser

All these queries do match:
* twitter.com/testuser
* com/testuser
* testuser

But none of these does:
* https://twitter.com/testuser
* https://www.twitter.com/testuser
* www.twitter.com/testuser

Debug output shows:
parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
But we need:
parsedquery_toString: +(url_words_ngram:\twitter com testuser\)

Complete debug outputs:
* a valid search: 
http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
* an invalid search: 
http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww

The complete discussion and explanation of the problem is here: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html

I didn't find a clear explanation how can we upgrade Solr, there's no any 
replacement or a workarround to this, so this is not just a major change but a 
major disrespect to all existing Solr users who are using this feature.

  was:
Setup:
* Schema version is 1.5
* Field config:
{code}
fieldType name=words_ngram class=solr.TextField omitNorms=false 
autoGeneratePhraseQueries=true
  analyzer
tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
filter class=solr.StopFilterFactory words=url_stopwords.txt 
ignoreCase=true /
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType
{code}
* Stop words:
{code}
http 
https 
ftp 
www
{code}

So very simple. In the index I have:
* twitter.com/testuser

All these queries do match:
* twitter.com/testuser
* com/testuser
* testuser

But none of these does:
* https://twitter.com/testuser
* https://www.twitter.com/testuser
* www.twitter.com/testuser

Debug output shows:
parsedquery_toString: +(url_words_ngram:\? twitter com zer0sleep\)
But we need:
parsedquery_toString: +(url_words_ngram:\twitter com zer0sleep\)

Complete debug outputs:
* a valid search: 
http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
* an invalid search: 
http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww

The complete discussion and explanation of the problem is here: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html

I didn't find a clear explanation how can we upgrade Solr, there's no any 
replacement or a workarround to this, so this is not just a major change but a 
major disrespect to all existing Solr users who are using this feature.


 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA

[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false

2014-09-02 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118078#comment-14118078
 ] 

Alexander S. commented on SOLR-6468:


Correct, but isn't this behavior deprecated? I mean matchVersion=4.3? I was 
told this could get removed from 5.0 as well.

If I do understand the problem correctly enablePositionIncrements=false could 
generate wrong tokens for those who do not know how to use this option 
correctly? It seems it requires a custom tokenizer and 
solr.PatternTokenizerFactory in my example should work properly. So instead of 
removing the option the problem with wrong tokens could be explained in the 
readme and the option could be kept for those who really needs it. That makes 
more sense to me than simply removing it.

Anyway, is there any chance the option could be restored? My usecase should 
clearly show how useful it might be. And I was trying to google the problem, 
there's a lot of complaints about this, but no solutions.

 Regression: StopFilterFactory doesn't work properly without 
 enablePositionIncrements=false
 

 Key: SOLR-6468
 URL: https://issues.apache.org/jira/browse/SOLR-6468
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8.1, 4.9
Reporter: Alexander S.

 Setup:
 * Schema version is 1.5
 * Field config:
 {code}
 fieldType name=words_ngram class=solr.TextField omitNorms=false 
 autoGeneratePhraseQueries=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ /
 filter class=solr.StopFilterFactory words=url_stopwords.txt 
 ignoreCase=true /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 {code}
 * Stop words:
 {code}
 http 
 https 
 ftp 
 www
 {code}
 So very simple. In the index I have:
 * twitter.com/testuser
 All these queries do match:
 * twitter.com/testuser
 * com/testuser
 * testuser
 But none of these does:
 * https://twitter.com/testuser
 * https://www.twitter.com/testuser
 * www.twitter.com/testuser
 Debug output shows:
 parsedquery_toString: +(url_words_ngram:\? twitter com testuser\)
 But we need:
 parsedquery_toString: +(url_words_ngram:\twitter com testuser\)
 Complete debug outputs:
 * a valid search: 
 http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za
 * an invalid search: 
 http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww
 The complete discussion and explanation of the problem is here: 
 http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html
 I didn't find a clear explanation how can we upgrade Solr, there's no any 
 replacement or a workarround to this, so this is not just a major change but 
 a major disrespect to all existing Solr users who are using this feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2014-08-11 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092884#comment-14092884
 ] 

Alexander S. commented on SOLR-3274:


Hi, thanks for the response.

bq. Well you never know
I've checked nodes status, that 3rd node was online all the time and there were 
no any load on it.

bq. In a 3-node ZK-cluster you need at least 2 healthy ZK-nodes connected with 
each other for the cluster to be operational.
That should be the problem since 2 other ZK instances might be (theoretically) 
unavailable because of heavy load (since they share same nodes with Solr 
instances). Both nodes have 16 CPU cores, 48G of memory and RAID 10 (SSD), I 
thought it would be hard to get performance issues there. Anyway, adding a 
separate node with 4th zookeeper instance might help, right?

 ZooKeeper related SolrCloud problems
 

 Key: SOLR-3274
 URL: https://issues.apache.org/jira/browse/SOLR-3274
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-ALPHA
 Environment: Any
Reporter: Per Steffensen

 Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
 Solr servers, running 28 slices of the same collection (collA) - all slices 
 have one replica (two shards all in all - leader + replica) - 56 cores all in 
 all (8 shards on each solr instance). But anyways...
 Besides the problem reported in SOLR-3273, the system seems to run fine under 
 high load for several hours, but eventually errors like the ones shown below 
 start to occur. I might be wrong, but they all seem to indicate some kind of 
 unstability in the collaboration between Solr and ZooKeeper. I have to say 
 that I havnt been there to check ZooKeeper at the moment where those 
 exception occur, but basically I dont believe the exceptions occur because 
 ZooKeeper is not running stable - at least when I go and check ZooKeeper 
 through other channels (e.g. my eclipse ZK plugin) it is always accepting 
 my connection and generally seems to be doing fine.
 Exception 1) Often the first error we see in solr.log is something like this
 {code}
 Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
 Updates are disabled.
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 {code}
 I believe this error basically occurs because SolrZkClient.isConnected 
 reports false, which means that its internal keeper.getState does not 
 return ZooKeeper.States.CONNECTED. 

[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2014-08-08 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090519#comment-14090519
 ] 

Alexander S. commented on SOLR-3274:


Suffering from the same problem, happens during high load on the nodes.

Our setup is pretty simple, 4 nodes: 2 shards, 2 replicas and 3 zookeeper 
instance. Everything is running on 3 physical nodes:
* 1st node — 1 zookeeper instance
* 2nd node — 2 shards and 1 zookeeper
* 3rd node — 2 replicas and 1 zookeeper

And running solr instances this way:
java -Xms2G -Xmx16G -XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=80 
-DzkHost=zoo1.devops:2181,zoo2.devops:2181,zoo3.devops:2181 
-Dcollection.configName=Carmen -Dbootstrap_confdir=./solr/conf 
-Dbootstrap_conf=true -DnumShards=2 -jar start.jar etc/jetty.xml

And once loading increases we get:
{code}
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are 
disabled.
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1306)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:981)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:349)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:278)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:744)
{code}

That's simply impossible for all 3 zookeeper instances to get offline 
simultaneously. I understand that 2nd and 3rd nodes could be overloaded because 
of Solr, but 1st node runs just a single zookeeper instance and the load 
average on that node is close to zero.

Since there's always at least 1 stable ZK node this seems like a 
communication/reliability bug in Solr.


 ZooKeeper related SolrCloud problems
 

[jira] [Comment Edited] (SOLR-3274) ZooKeeper related SolrCloud problems

2014-08-08 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090519#comment-14090519
 ] 

Alexander S. edited comment on SOLR-3274 at 8/8/14 9:28 AM:


Suffering from the same problem, happens during high load on the nodes.

Our setup is pretty simple, 4 solr instances: 2 shards, 2 replicas and 3 
zookeeper instances. Everything is running on 3 physical nodes:
* 1st node — 1 zookeeper instance
* 2nd node — 2 solr shards and 1 zookeeper
* 3rd node — 2 solr replicas and 1 zookeeper

We're running solr instances this way:
java -Xms2G -Xmx16G -XX:+UseConcMarkSweepGC 
-XX:CMSInitiatingOccupancyFraction=80 
-DzkHost=zoo1.devops:2181,zoo2.devops:2181,zoo3.devops:2181 
-Dcollection.configName=Carmen -Dbootstrap_confdir=./solr/conf 
-Dbootstrap_conf=true -DnumShards=2 -jar start.jar etc/jetty.xml

And once loading increases we get:
{code}
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are 
disabled.
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1306)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:981)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121)
at 
org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:349)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:278)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:744)
{code}

That's simply impossible for all 3 zookeeper instances to get offline 
simultaneously. I understand that 2nd and 3rd nodes could be overloaded because 
of Solr, but 1st node runs just a single zookeeper instance and the load 
average on that node is close to zero.

Since there's always at least 1 stable ZK node this seems like a 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-07-29 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077928#comment-14077928
 ] 

Alexander S. commented on SOLR-4787:


It seems join doesn't work as expected, please have a look: 
http://lucene.472066.n3.nabble.com/Search-results-inconsistency-when-using-joins-td4149810.html

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command 

[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-29 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012449#comment-14012449
 ] 

Alexander S. commented on SOLR-5463:


I have another idea about cursors implementation. That's just an idea, I am not 
sure if that's possible to do.

Is it possible to use cursors together with start and rows parameters? That 
would allow to use pagination and draw links for prev, next, 1, 2, 3, n+1 
pages, as we can do now. So that instead of using cursorMark we'll use 
cursorName, which could be a static. So the request start:0, rows:10, 
cursorName:* will return first page of results and a static cursor name, which 
could then be used for all other pages (i.e. start:10, rows:10, 
cursorName:#{received_cursor_name}).

Does that make sense?

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010881#comment-14010881
 ] 

Alexander S. commented on SOLR-5463:


Inability to use this without sorting by an unique key (e.g. id) makes this 
feature useless. Same could be achieved previously with sorting by id and 
searching for docs where id is / than the last received. See how cursors do 
work in MongoDB, that's the right direction.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010883#comment-14010883
 ] 

Alexander S. commented on SOLR-5463:


http://docs.mongodb.org/manual/core/cursors/

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010888#comment-14010888
 ] 

Alexander S. commented on SOLR-5463:


Sorry for spamming, but can't edit my previous message. I just found that in 
mongo they also aren't isolated and could return duplicates, I was thinking 
they are. But sorting docs by id is not acceptable in 99% of use cases, 
especially in Solr, where it is more expected to get results sorted by 
relevance.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011226#comment-14011226
 ] 

Alexander S. commented on SOLR-5463:


Oh, that's awesome, thanks for the tip.

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)

2014-05-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012084#comment-14012084
 ] 

Alexander S. commented on SOLR-5463:


If, as David mentioned, Solr will add it only if it is not there, this should 
keep the ability for users to manually specify another key and order when that 
is required (a rare case it seems).

 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 4.7, 5.0

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-04-13 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967927#comment-13967927
 ] 

Alexander S. commented on SOLR-5871:


I already asked at solr-u...@lucene.apache.org but seems only one way currently 
is to read the debug explanation. Unfortunately I am not a java developer thus 
unable to create a patch, but Solr jira has a wish type so I posted my wish 
here.

 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.
Assignee: Erick Erickson

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-04-07 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961730#comment-13961730
 ] 

Alexander S. commented on SOLR-5871:


Any luck this could be reviewed by someone?

 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4787) Join Contrib

2014-04-07 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961733#comment-13961733
 ] 

Alexander S. edited comment on SOLR-4787 at 4/7/14 9:10 AM:


@Kranti Parisa, hi, any luck with this?


was (Author: aheaven):
@Kranti Parisa, hi, any lick with this?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-04-07 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961733#comment-13961733
 ] 

Alexander S. commented on SOLR-4787:


@Kranti Parisa, hi, any lick with this?

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  in the WEB-INF/lib directory 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-21 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943153#comment-13943153
 ] 

Alexander S. commented on SOLR-4787:


Kranti Parisa

Did you try to apply this patch to 4.7.0? I was trying to download it here: 
http://www.apache.org/dyn/closer.cgi/lucene/solr/4.7.0 and then did the next 
steps:
* ant compile
* ant ivy-bootstrap
* ant dist
And then created a package for my Linux distributive, but no luck, Solr fails 
to initialize with
queryParser name=hjoin 
class=org.apache.solr.search.joins.HashSetJoinQParserPlugin/

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-19 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940740#comment-13940740
 ] 

Alexander S. commented on SOLR-4787:


Any query fails, seems I am doing something wrong (perhaps the patch was 
applied incorrectly). I see this error:
{quote}
SolrCore Initialization Failures
crm-dev: 
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Error loading class 'org.apache.solr.search.joins.HashSetJoinQParserPlugin'
{quote}
when trying to access the web interface.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 

[jira] [Created] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-03-17 Thread Alexander S. (JIRA)
Alexander S. created SOLR-5871:
--

 Summary: Ability to see the list of fields that matched the query 
with scores
 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.


Hello, I need the ability to show users what content matched their query, this 
way:
| Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
content | 
| John Doe | Yes| No  | Yes | No
| Yes | 
| Jane Doe | No | Yes | No  | No
| Yes | 

All these columns are indexed text fields and I need to know what content 
matched the query and would be also cool to be able to show the score per field.

As far as I know right now there's no way to return this information when 
running a query request. Debug outputs is suitable for visual review but has 
lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-03-17 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-5871:
---

Description: 
Hello, I need the ability to tell users what content matched their query, this 
way:
| Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
content | 
| John Doe | Yes| No  | Yes | No
| Yes | 
| Jane Doe | No | Yes | No  | No
| Yes | 

All these columns are indexed text fields and I need to know what content 
matched the query and would be also cool to be able to show the score per field.

As far as I know right now there's no way to return this information when 
running a query request. Debug outputs is suitable for visual review but has 
lots of nesting levels and is hard for understanding.

  was:
Hello, I need the ability to show users what content matched their query, this 
way:
| Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
content | 
| John Doe | Yes| No  | Yes | No
| Yes | 
| Jane Doe | No | Yes | No  | No
| Yes | 

All these columns are indexed text fields and I need to know what content 
matched the query and would be also cool to be able to show the score per field.

As far as I know right now there's no way to return this information when 
running a query request. Debug outputs is suitable for visual review but has 
lots of nesting levels and is hard for understanding.


 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937732#comment-13937732
 ] 

Alexander S. commented on SOLR-4787:


Thank you, Kranti Parisa, I am far from java development, how can I apply this 
patch and build solr for linux? I tried to patch, it creates a new folder 
joins in solr/contrib, installed ivy and launched ant compile but got this 
error:
{quote}
common.compile-core:
[mkdir] Created dir: 
/home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java
[javac] Compiling 3 source files to 
/home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 
/home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:883:
 error: reached end of file while parsing
[javac]   return this.delegate.acceptsDocsOutOfOrder();
[javac]^
[javac] 
/home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:884:
 error: reached end of file while parsing
[javac] 2 errors
[javac] 1 warning

BUILD FAILED
/home/heaven/Desktop/solr-4.7.0/build.xml:106: The following error occurred 
while executing this line:
/home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:458: The following error 
occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:449: The following error 
occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:471: The following 
error occurred while executing this line:
/home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:1736: Compile failed; 
see the compiler error output for details.

Total time: 8 minutes 55 seconds
{quote}

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937747#comment-13937747
 ] 

Alexander S. commented on SOLR-4787:


Nvm, there were 3 missing } at the end of HashSetJoinQParserPlugin.java, the 
build was successful, testing now.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937845#comment-13937845
 ] 

Alexander S. commented on SOLR-4787:


Kranti,

Do I need to update anything in my solr config/schema? I've just tried the 
patched version and it still ignores the fq parameter. I was using solr 4.7.0.

Thanks,
Alex

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937887#comment-13937887
 ] 

Alexander S. commented on SOLR-4787:


Hi, I am using simple join, this way: {!join from=profile_ids_im to=id_i 
fq=$joinFilter1 v=$joinQuery1}.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937921#comment-13937921
 ] 

Alexander S. commented on SOLR-4787:


Ok, thx, I'll try with hjoin. And yes, I am trying to do it on the same core.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-17 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938117#comment-13938117
 ] 

Alexander S. commented on SOLR-4787:


Getting this error:
{code}
RSolr::Error::Http - 500 Internal Server Error
Error: {msg=SolrCore 'crm-dev' is not available due to init failure: Error 
loading class 
'org.apache.solr.search.joins.HashSetJoinQParserPlugin',trace=org.apache.solr.common.SolrException:
 SolrCore 'crm-dev' is not available due to init failure: Error loading class 
'org.apache.solr.search.joins.HashSetJoinQParserPlugin'
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:827)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:309)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
{code}

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.8

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-05 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920798#comment-13920798
 ] 

Alexander S. commented on SOLR-4787:


Hi Joel, thanks, I seems need to perform a nested join inside a single 
collection, but need fq inside join as it is shown here: 
https://issues.apache.org/jira/browse/SOLR-4787?focusedCommentId=13750854page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13750854

I have a single collection with a field type which determines the kind of 
document. 3 types of documents: Profile, Site, and SiteSource.
When searching for Profiles I have to look in SiteSource content, so I need 
something like this:
{code}
q = {!join from=owner_id_im to=id_i fq=$joinFilter1 v=$joinQuery1} # Profile → 
Site join
joinQuery1 = {!join from=site_id_i to=id_i fq=$joinFilter2 v=$joinQuery2} # 
Site → SiteSource join
joinQuery2 = {!edismax}my_keywords
joinFilter1 = type:Site
joinFilter2 = type:SiteSource
{code}

Right now this works only partially, fq inside {!join} is ignored.
When to expect this patch to be merged? Also, will it work in the way I've 
explained or do I understand it wrong?

Thank you,
Alex

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 

[jira] [Comment Edited] (SOLR-4787) Join Contrib

2014-03-05 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920798#comment-13920798
 ] 

Alexander S. edited comment on SOLR-4787 at 3/5/14 12:36 PM:
-

Hi Joel, thanks, I seems need to perform a nested join inside a single 
collection, but need fq inside join as it is shown here: 
https://issues.apache.org/jira/browse/SOLR-4787?focusedCommentId=13750854page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13750854

I have a single collection with a field type which determines the kind of 
document. 3 types of documents: Profile, Site, and SiteSource.
When searching for Profiles I have to look in SiteSource content, so I need 
something like this:
{code}
q = {!join from=owner_id_im to=id_i fq=$joinFilter1 v=$joinQuery1} # Profile → 
Site join
joinQuery1 = {!join from=site_id_i to=id_i fq=$joinFilter2 v=$joinQuery2} # 
Site → SiteSource join
joinQuery2 = {!edismax}my_keywords
joinFilter1 = type:Site
joinFilter2 = type:SiteSource
{code}

Right now this works only partially, fq inside \{!join\} is ignored.
When to expect this patch to be merged? Also, will it work in the way I've 
explained or do I understand it wrong?

Thank you,
Alex


was (Author: aheaven):
Hi Joel, thanks, I seems need to perform a nested join inside a single 
collection, but need fq inside join as it is shown here: 
https://issues.apache.org/jira/browse/SOLR-4787?focusedCommentId=13750854page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13750854

I have a single collection with a field type which determines the kind of 
document. 3 types of documents: Profile, Site, and SiteSource.
When searching for Profiles I have to look in SiteSource content, so I need 
something like this:
{code}
q = {!join from=owner_id_im to=id_i fq=$joinFilter1 v=$joinQuery1} # Profile → 
Site join
joinQuery1 = {!join from=site_id_i to=id_i fq=$joinFilter2 v=$joinQuery2} # 
Site → SiteSource join
joinQuery2 = {!edismax}my_keywords
joinFilter1 = type:Site
joinFilter2 = type:SiteSource
{code}

Right now this works only partially, fq inside {!join} is ignored.
When to expect this patch to be merged? Also, will it work in the way I've 
explained or do I understand it wrong?

Thank you,
Alex

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-03-05 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920805#comment-13920805
 ] 

Alexander S. commented on SOLR-4787:


Hi, 4.4 and 4.7

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  in the WEB-INF/lib directory of the solr webapplication. 
 This will ensure that the top level Solr 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-02-28 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915768#comment-13915768
 ] 

Alexander S. commented on SOLR-4787:


Just tried 4.7.0 and it does not work either.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 solr-joins-4.*-.jar  in the WEB-INF/lib directory of the solr webapplication. 
 This will 

[jira] [Commented] (SOLR-4787) Join Contrib

2014-02-26 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912786#comment-13912786
 ] 

Alexander S. commented on SOLR-4787:


Which release does have support for {!join} with fq parameter? I was trying 
with 4.5.1 but fq seems does not have any effect.

 Join Contrib
 

 Key: SOLR-4787
 URL: https://issues.apache.org/jira/browse/SOLR-4787
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.2.1
Reporter: Joel Bernstein
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-4787-deadlock-fix.patch, 
 SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
 SOLR-4797-hjoin-multivaluekeys-trunk.patch


 This contrib provides a place where different join implementations can be 
 contributed to Solr. This contrib currently includes 3 join implementations. 
 The initial patch was generated from the Solr 4.3 tag. Because of changes in 
 the FieldCache API this patch will only build with Solr 4.2 or above.
 *HashSetJoinQParserPlugin aka hjoin*
 The hjoin provides a join implementation that filters results in one core 
 based on the results of a search in another core. This is similar in 
 functionality to the JoinQParserPlugin but the implementation differs in a 
 couple of important ways.
 The first way is that the hjoin is designed to work with int and long join 
 keys only. So, in order to use hjoin, int or long join keys must be included 
 in both the to and from core.
 The second difference is that the hjoin builds memory structures that are 
 used to quickly connect the join keys. So, the hjoin will need more memory 
 then the JoinQParserPlugin to perform the join.
 The main advantage of the hjoin is that it can scale to join millions of keys 
 between cores and provide sub-second response time. The hjoin should work 
 well with up to two million results from the fromIndex and tens of millions 
 of results from the main query.
 The hjoin supports the following features:
 1) Both lucene query and PostFilter implementations. A *cost*  99 will 
 turn on the PostFilter. The PostFilter will typically outperform the Lucene 
 query when the main query results have been narrowed down.
 2) With the lucene query implementation there is an option to build the 
 filter with threads. This can greatly improve the performance of the query if 
 the main query index is very large. The threads parameter turns on 
 threading. For example *threads=6* will use 6 threads to build the filter. 
 This will setup a fixed threadpool with six threads to handle all hjoin 
 requests. Once the threadpool is created the hjoin will always use it to 
 build the filter. Threading does not come into play with the PostFilter.
 3) The *size* local parameter can be used to set the initial size of the 
 hashset used to perform the join. If this is set above the number of results 
 from the fromIndex then the you can avoid hashset resizing which improves 
 performance.
 4) Nested filter queries. The local parameter fq can be used to nest a 
 filter query within the join. The nested fq will filter the results of the 
 join query. This can point to another join to support nested joins.
 5) Full caching support for the lucene query implementation. The filterCache 
 and queryResultCache should work properly even with deep nesting of joins. 
 Only the queryResultCache comes into play with the PostFilter implementation 
 because PostFilters are not cacheable in the filterCache.
 The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
 plugin is referenced by the string hjoin rather then join.
 fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
 fq=$qq\}user:customer1qq=group:5
 The example filter query above will search the fromIndex (collection2) for 
 user:customer1 applying the local fq parameter to filter the results. The 
 lucene filter query will be built using 6 threads. This query will generate a 
 list of values from the from field that will be used to filter the main 
 query. Only records from the main query, where the to field is present in 
 the from list will be included in the results.
 The solrconfig.xml in the main query core must contain the reference to the 
 hjoin.
 queryParser name=hjoin 
 class=org.apache.solr.joins.HashSetJoinQParserPlugin/
 And the join contrib lib jars must be registed in the solrconfig.xml.
  lib dir=../../../contrib/joins/lib regex=.*\.jar /
 After issuing the ant dist command from inside the solr directory the joins 
 contrib jar will appear in the solr/dist directory. Place the the 
 

[jira] [Commented] (LUCENE-4963) Deprecate broken TokenFilter constructors

2013-12-06 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841236#comment-13841236
 ] 

Alexander S. commented on LUCENE-4963:
--

Hi, how we're now supposed to fix this?
http://stackoverflow.com/questions/18668376/solr-4-4-stopfilterfactory-and-enablepositionincrements

 Deprecate broken TokenFilter constructors
 -

 Key: LUCENE-4963
 URL: https://issues.apache.org/jira/browse/LUCENE-4963
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.4

 Attachments: LUCENE-4963.patch


 We have some TokenFilters which are only broken with specific options. This 
 includes:
  * TrimFilter when updateOffsets=true
  * StopFilter, JapanesePartOfSpeechStopFilter, KeepWordFilter, LengthFilter, 
 TypeTokenFilter when enablePositionIncrements=false
 I think we should deprecate these behaviors in 4.4 and remove them in trunk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4963) Deprecate broken TokenFilter constructors

2013-12-06 Thread Alexander S. (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841239#comment-13841239
 ] 

Alexander S. commented on LUCENE-4963:
--

I do index
twitter.com/testuser
then search for
http://www.twitter.com/testuser

These are in stopwords filter:
http
https
www

No results.

 Deprecate broken TokenFilter constructors
 -

 Key: LUCENE-4963
 URL: https://issues.apache.org/jira/browse/LUCENE-4963
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
 Fix For: 4.4

 Attachments: LUCENE-4963.patch


 We have some TokenFilters which are only broken with specific options. This 
 includes:
  * TrimFilter when updateOffsets=true
  * StopFilter, JapanesePartOfSpeechStopFilter, KeepWordFilter, LengthFilter, 
 TypeTokenFilter when enablePositionIncrements=false
 I think we should deprecate these behaviors in 4.4 and remove them in trunk.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-5332:
---

Affects Version/s: 4.4

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-11-27 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-5332:
---

Affects Version/s: 4.6
   4.5
   4.5.1

 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Affects Versions: 4.4, 4.5, 4.5.1, 4.6
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-10-10 Thread Alexander S. (JIRA)
Alexander S. created SOLR-5332:
--

 Summary: Add preserve original setting to the 
EdgeNGramFilterFactory
 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.


Hi, as described here: 
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
 the problem is in that if you have these 2 strings to index:
1. facebook.com/someuser.1
2. facebook.com/someveryandverylongusername
and the edge ngram filter factory with min and max gram size settings 2 and 25, 
search requests for these urls will fail.

But search requests for:
1. facebook.com/someuser
2. facebook.com/someveryandverylonguserna
will work properly.

It's because first url has 1 at the end, which is lover that the allowed min 
gram size. In the second url the user name is longer than the max gram size (27 
characters).

Would be good to have a preserve original option, that will add the original 
string to the index if it does not fit the allowed gram size, so that 1 and 
someveryandverylongusername tokens will also be added to the index.

Best,
Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory

2013-10-10 Thread Alexander S. (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander S. updated SOLR-5332:
---

Description: 
Hi, as described here: 
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
 the problem is in that if you have these 2 strings to index:
1. facebook.com/someuser.1
2. facebook.com/someveryandverylongusername
and the edge ngram filter factory with min and max gram size settings 2 and 25, 
search requests for these urls will fail.

But search requests for:
1. facebook.com/someuser
2. facebook.com/someveryandverylonguserna
will work properly.

It's because first url has 1 at the end, which is lover than the allowed min 
gram size. In the second url the user name is longer than the max gram size (27 
characters).

Would be good to have a preserve original option, that will add the original 
string to the index if it does not fit the allowed gram size, so that 1 and 
someveryandverylongusername tokens will also be added to the index.

Best,
Alex

  was:
Hi, as described here: 
http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
 the problem is in that if you have these 2 strings to index:
1. facebook.com/someuser.1
2. facebook.com/someveryandverylongusername
and the edge ngram filter factory with min and max gram size settings 2 and 25, 
search requests for these urls will fail.

But search requests for:
1. facebook.com/someuser
2. facebook.com/someveryandverylonguserna
will work properly.

It's because first url has 1 at the end, which is lover that the allowed min 
gram size. In the second url the user name is longer than the max gram size (27 
characters).

Would be good to have a preserve original option, that will add the original 
string to the index if it does not fit the allowed gram size, so that 1 and 
someveryandverylongusername tokens will also be added to the index.

Best,
Alex


 Add preserve original setting to the EdgeNGramFilterFactory
 -

 Key: SOLR-5332
 URL: https://issues.apache.org/jira/browse/SOLR-5332
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.

 Hi, as described here: 
 http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html
  the problem is in that if you have these 2 strings to index:
 1. facebook.com/someuser.1
 2. facebook.com/someveryandverylongusername
 and the edge ngram filter factory with min and max gram size settings 2 and 
 25, search requests for these urls will fail.
 But search requests for:
 1. facebook.com/someuser
 2. facebook.com/someveryandverylonguserna
 will work properly.
 It's because first url has 1 at the end, which is lover than the allowed 
 min gram size. In the second url the user name is longer than the max gram 
 size (27 characters).
 Would be good to have a preserve original option, that will add the 
 original string to the index if it does not fit the allowed gram size, so 
 that 1 and someveryandverylongusername tokens will also be added to the 
 index.
 Best,
 Alex



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2012-08-26 Thread Alexander S. (JIRA)














































Alexander S.
 commented on  SOLR-874


Dismax parser exceptions on trailing OPERATOR















Hi, sorry for asking this here, but is the next error related to this issue?

Aug 26, 2012 8:22:33 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:134)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:279)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse '"admission";"adolescent";"adrenal gland disorders";"adrenocortical carcinoma";"adrenoleukodystrophy see leukodystrophies";"advocacy";"afd";"affordability";"african american health";"africaso";"aga";"aganglionic megacolon";"aggressive mastocytosis";"aging";"agranulocytic angina";"agu";"agyria";"ahc";"ahd";"ahds";"ahus";"aicardi syndrome";"aids";"aids and infections";"aids and pregnancy";"': Lexical error at line 1, column 391.  Encountered: EOF after : ""
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105)
... 21 more
Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 391.  Encountered: EOF after : ""
at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229)
at org.apache.lucene.queryParser.QueryParser.jj_scan_token(QueryParser.java:1733)
at org.apache.lucene.queryParser.QueryParser.jj_3R_2(QueryParser.java:1616)
at org.apache.lucene.queryParser.QueryParser.jj_3_1(QueryParser.java:1623)
at org.apache.lucene.queryParser.QueryParser.jj_2_1(QueryParser.java:1609)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1288)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1274)
at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
... 24 more


And this one also looks very similar
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3C007b01ccf78e$9171c1f0$b45545d0$@gmail.com%3E

Best,
Alex



























This message is automatically generated by JIRA.
If you think it was 

[jira] [Comment Edited] (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2012-08-26 Thread Alexander S. (JIRA)












































 
Alexander S.
 edited a comment on  SOLR-874


Dismax parser exceptions on trailing OPERATOR
















Hi, sorry for asking this here, but is the next error related to this issue?

Aug 26, 2012 8:36:24 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:134)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:279)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse '"hgps" "hhho" "hhrh"  ...truncated...  "kidney stones" "kidney transplant" "kidney trafq=type:Tweet': Lexical error at line 1, column 6783.  Encountered: EOF after : "\"kidney trafq=type:Tweet"
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105)
... 21 more
Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 6783.  Encountered: EOF after : "\"kidney trafq=type:Tweet"
at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229)
at org.apache.lucene.queryParser.QueryParser.jj_ntk(QueryParser.java:1772)
at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1555)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1317)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1274)
at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
... 24 more


And this one also looks very similar
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3C007b01ccf78e$9171c1f0$b45545d0$@gmail.com%3E

Best,
Alex



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2012-08-26 Thread Alexander S. (JIRA)












































 
Alexander S.
 edited a comment on  SOLR-874


Dismax parser exceptions on trailing OPERATOR
















Hi, sorry for asking this here, but is the next error related to this issue?

Aug 26, 2012 8:36:24 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:134)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:279)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse '"hgps" "hhho" "hhrh"  ...truncated...  "kidney stones" "kidney transplant" "kidney trafq=type:Tweet': Lexical error at line 1, column 6783.  Encountered: EOF after : "\"kidney trafq=type:Tweet"
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79)
at org.apache.solr.search.QParser.getQuery(QParser.java:143)
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105)
... 21 more
Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 6783.  Encountered: EOF after : "\"kidney trafq=type:Tweet"
at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229)
at org.apache.lucene.queryParser.QueryParser.jj_ntk(QueryParser.java:1772)
at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1555)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1317)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1274)
at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
... 24 more


"kidney trafq=" should be "kidney transplantation" fq='type:Tweet', so it looks like the query string was truncated.

And this one also looks very similar
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3C007b01ccf78e$9171c1f0$b45545d0$@gmail.com%3E

Best,
Alex



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For 

[jira] Created: (SOLR-1862) CLONE -java.io.IOException: read past EOF

2010-04-02 Thread Alexander S (JIRA)
CLONE -java.io.IOException: read past EOF
-

 Key: SOLR-1862
 URL: https://issues.apache.org/jira/browse/SOLR-1862
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Alexander S
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 1.5


A query with relevancy scores of all zeros produces an invalid doclist that 
includes sentinel values 2147483647 and causes Solr to request that invalid 
docid from Lucene which results in a java.io.IOException: read past EOF

http://search.lucidimagination.com/search/document/2d5359c0e0d103be/java_io_ioexception_read_past_eof_after_solr_1_4_0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.