Re: Fq and termfrequency are not showing the correct results

2017-04-04 Thread Ayush Gupta
Thanks for the reply. Actaully I've always used termfreq to get word counts
for 2- , 3- and 4- word keywords.

we use shingles to accomplish this. (ShingleFactory)

On Wed, Apr 5, 2017 at 12:17 AM, Erick Erickson 
wrote:

> Functions like termfreq operate on single terms post analysis
>
> Since it's an analyzed field you have no _term_ "bachelor's degree" or
> even "bachelor degree" in the field. You have two terms, "bachelor"
> and "degree". This also assumes that by "zero results" you mean you
> get no frequency information back.
>
> Best,
> Erick
>
> On Tue, Apr 4, 2017 at 2:19 AM, Ayush Gupta  wrote:
> > Hi Everyone,
> >
> > I have a document that contains data like this "Bachelor's degree is
> easier
> > to get" in the 'body' field and I am making a query on this field
> searching
> > for word 'Bachelor's degree' like this -
> > query?fq=body:"bachelor%27s%20degree"&fl=body_frequency:
> termfreq(body,"bachelor%27s%20degree"),body
> > and I am getting zero results in response even when I have documents that
> > contains words like 'Bachelor's degree'.
> >
> > I checked in the admin panel tab , there i can see the
> > WordDelimiterFilterFactory applied on the word 'Bachelor's Degree' and
> > converting it to 'Bachelor degree'. So both in the Field Value (Query)
> and
> > Field Value (Index) the WordDelimiterFilterFactory is converting the word
> > 'Bachelor's Degree' to 'Bachelor's Degree' , SO why I am getting zero
> > results when quering. I have attached the screenshots of my analysis
> page.
> >
> >
> > I have attached a code file 'code.txt' where you can see the code for the
> > field 'body'.
> >
> >
> > Please tell me what Am I doing wrong.
> >
> > Thanks
> >
> >
> > CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents,
> files
> > or previous e-mail messages attached to it, are confidential and may be
> > privileged.  If you are not the intended recipient, or a person
> responsible
> > for delivering it to the intended recipient, you are hereby notified that
> > any review, disclosure, copying, distribution, retransmission,
> dissemination
> > or other use of any of the information contained in, or attached to, this
> > transmission is STRICTLY PROHIBITED.  If you have received this
> transmission
> > in error, please immediately notify the sender.  Please destroy the
> original
> > transmission and its attachments without reading or saving in any manner.
> > Thank you.
> >
> > algoscale technologies private limited
>

-- 
*CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents, files 
or previous e-mail messages attached to it, are confidential and may be 
privileged.  If you are not the intended recipient, or a person responsible 
for delivering it to the intended recipient, you are hereby notified that 
any review, disclosure, copying, distribution, retransmission, 
dissemination or other use of any of the information contained in, or 
attached to, this transmission is STRICTLY PROHIBITED.  If you have 
received this transmission in error, please immediately notify the sender. 
 Please destroy the original transmission and its attachments without 
reading or saving in any manner. Thank you.*

*algoscale technologies private limited*


RE: How to Apply 'implicit' routing in exist collection in solr 6.1.0

2017-04-04 Thread Ketan Thanki
Thanks Anshum,

I have got some understanding regarding to it and i need to implement implicit 
routing for insert and retrieve documents from specific shard based on the id 
which I have use as router field . I have try by make changes on 
core.properties. but it can't work So can u please let me for the config 
changes needed. 

Please do needful.

Regards,
Ketan.

-Original Message-
From: Anshum Gupta [mailto:ans...@anshumgupta.net] 
Sent: Tuesday, April 04, 2017 9:05 PM
To: solr-user@lucene.apache.org
Subject: Re: How to Apply 'implicit' routing in exist collection in solr 6.1.0

Hi Ketan,

I just want to be sure about your understanding of the 'implicit' router.

Implicit router in Solr puts the onus of correctly routing the documents on the 
user, instead of 'implicitly' or automatically routing them.

-Anshum

On Tue, Apr 4, 2017 at 2:01 AM Ketan Thanki  wrote:

>
> Hi,
>
> Need the help for how to apply 'implicit' routing in existing collections.
> e.g :  I have configure the 2 collections with each has 4 shard and 4 
> replica so what changes should i do for apply ' implicit' routing.
>
> Please  do needful with some examples.
>
> Regards,
> Ketan.
>
> [CC Award Winners!]
>
>


Re: Number of shards - Best practice

2017-04-04 Thread Walter Underwood
> On Apr 4, 2017, at 7:38 PM, Muhammad Imad Qureshi 
>  wrote:
> 
> Hi
> I was recently told that ideally the number of shards in a SOLR cluster 
> should be equal to a power of 2. If this is indeed a best practice, then what 
> is the rationale behind this recommendation? ThanksImad

I don’t know of any such recommendation. Assuming you are not RAM or disk 
limited, going to two or three shards won’t help a lot. If those get you out of 
a bottleneck, you’ll see a difference.

I believe that some of the performance of Solr is proportional to the number of 
distinct terms in the index (the vocabulary). A rule of thumb is the vocabulary 
is proportional to the square root of the number of terms in the index. Which 
is often related to the number of documents. With this assumption, four shards 
gives a 2X speedup. Which has worked for me. 

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



Solrj HttpSolrServer retryHandler

2017-04-04 Thread Lasitha Wattaladeniya
Hi folks,

Is there a API to implement a retryHandler in HttpSolrServer ?

I'm using solrj 4.10.4

Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com


Number of shards - Best practice

2017-04-04 Thread Muhammad Imad Qureshi
Hi
I was recently told that ideally the number of shards in a SOLR cluster should 
be equal to a power of 2. If this is indeed a best practice, then what is the 
rationale behind this recommendation? ThanksImad

RE: Solr performance issue on indexing

2017-04-04 Thread Allison, Timothy B.
>  Also we will try to decouple tika to solr.
+1


-Original Message-
From: tstusr [mailto:ulfrhe...@gmail.com] 
Sent: Friday, March 31, 2017 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr performance issue on indexing

Hi, thanks for the feedback.

Yes, it is about OOM, indeed even solr instance makes unavailable. As I was 
saying I can't find more relevant information on logs.

We're are able to increment JVM amout, so, the first thing we'll do will be 
that.

As far as I know, all documents are bounded to that amount (14K), just the 
processing could change. We are making some tests on indexing and it seems it 
works without concurrent threads. Also we will try to decouple tika to solr.

By the way, make it available with solr cloud will improve performance? Or 
there will be no perceptible improvement?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: JSON facet bucket list not correct with sharded query

2017-04-04 Thread Karthik Ramachandran
Since the attachment was removed sending the code.

import java.util.List;
import java.util.Random;
import java.util.UUID;

import org.apache.solr.client.solrj.SolrRequest.METHOD;
import org.apache.solr.client.solrj.impl.HttpSolrClient;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrInputDocument;
import org.apache.solr.common.params.CommonParams;
import org.apache.solr.common.params.ModifiableSolrParams;
import org.apache.solr.common.params.ShardParams;
import org.apache.solr.common.util.NamedList;

public class JsonFacetPagingTest {
  public static void main(String[] args) throws Throwable {
final String SOLR_URL = "http://localhost:8983/solr";;
JsonFacetPagingTest tests = new JsonFacetPagingTest();
// Uncomment below to add docs to the core.
// tests.addDocumentsToCore(SOLR_URL, "fileduplicate01", 0, 4);
// tests.addDocumentsToCore(SOLR_URL, "fileduplicate02", 0, 600);
// tests.addDocumentsToCore(SOLR_URL, "fileduplicate02", 100, 600);
// tests.addDocumentsToCore(SOLR_URL, "fileduplicate02", 200, 600);
// tests.addDocumentsToCore(SOLR_URL, "fileduplicate02", 300, 600);
// tests.addDocumentsToCore(SOLR_URL, "fileduplicate02", 400, 600);
// tests.addDocumentsToCore(SOLR_URL, "fileduplicate02", 700, 800);

// Uncomment below to run the queries in pages.
// tests.testPaging(SOLR_URL, "fileduplicate01", 15);
  }

  protected void addDocumentsToCore(String solrURL, String coreName, int 
startIndex, int numberOfRecords)
  throws Exception {
int endIndex = startIndex + numberOfRecords;
if (numberOfRecords > 0 && endIndex > startIndex) {
  Random ran = new Random();
  HttpSolrClient client = new HttpSolrClient.Builder(solrURL).build();
  for (int index = startIndex; index <= endIndex; ++index) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", UUID.randomUUID().toString());
doc.addField("filename", "filename-" + index);
doc.addField("size", (1024L * ran.nextInt()));
client.add(coreName, doc);
  }
  client.commit(coreName);
  client.close();
}
  }

  @SuppressWarnings("unchecked")
  protected void testPaging(String solrURL, String coreName, int limit) throws 
Exception {
Long offset = 0L;
Long numBuckets = 0L;
List buckets = null;
String facet = 
"{'duplicates':{'type':'terms','field':'filename','limit':%d,'offset':%d,'mincount':2,'numBuckets':true,'sort':'sum
 desc','facet': {'sum':'sum(size)'}}}";
HttpSolrClient client = new HttpSolrClient.Builder(solrURL).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set(CommonParams.Q, "*:*");
params.set(CommonParams.START, String.valueOf(CommonParams.START_DEFAULT));
params.set(CommonParams.ROWS, String.valueOf(CommonParams.START_DEFAULT));
params.set(ShardParams.SHARDS, solrURL + "/fileduplicate01," + solrURL + 
"/fileduplicate02");
do {
  params.set("json.facet", String.format(facet, limit, offset));
  QueryResponse queryResponse = client.query(coreName, params, METHOD.POST);
  if (queryResponse != null && queryResponse.getResponse() != null) {
NamedList facets = (NamedList) 
queryResponse.getResponse().get("facets");
NamedList duplicates = (NamedList) 
facets.get("duplicates");
numBuckets = ((Number) duplicates.get("numBuckets")).longValue();
buckets = (List) duplicates.get("buckets");
System.out.println(String.format("Result for Offset:%4d ==> Number of 
Buckets:%4d, Bucket Size:%4d, vals:%s",
offset, numBuckets, buckets.size(), buckets));
offset += limit;
  }
} while (buckets != null && buckets.size() != 0 && offset <= numBuckets);
client.close();
  }
}

With Thanks & Regards
Karthik Ramachandran


From: Karthik Ramachandran
Sent: Tuesday, April 4, 2017 8:32 PM
To: 'solr-user@lucene.apache.org' 
Subject: JSON facet bucket list not correct with sharded query

We are using JSON facet to list files that are duplicate(mincount: 2) in pages, 
after 2-3 page we don't any result even though there are more results.

Schema:
  
  
  

Query:
http://localhost:8983/solr/fileduplicate01/select/?wt=json&q=*:*&start=0&rows=0&shards=localhost:8983/solr/fileduplicate01,localhost:8983/solr/fileduplicate02&json.facet={
"duplicates":{"type":"terms","field":"filename","limit":15,"offset":0,"mincount":2,"numBuckets":true,"sort":"sum
 desc","facet": {"sum":"sum(size)"}}}

Create 2 cores named fileduplicate01 and fileduplicate01 with the same schema 
and run the attached java to populate the data and run the query.

Any help is appreciated.


With Thanks & Regards
Karthik Ramachandran

***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If yo

JSON facet bucket list not correct with sharded query

2017-04-04 Thread Karthik Ramachandran
We are using JSON facet to list files that are duplicate(mincount: 2) in pages, 
after 2-3 page we don't any result even though there are more results.

Schema:
  
  
  

Query:
http://localhost:8983/solr/fileduplicate01/select/?wt=json&q=*:*&start=0&rows=0&shards=localhost:8983/solr/fileduplicate01,localhost:8983/solr/fileduplicate02&json.facet={
"duplicates":{"type":"terms","field":"filename","limit":15,"offset":0,"mincount":2,"numBuckets":true,"sort":"sum
 desc","facet": {"sum":"sum(size)"}}}

Create 2 cores named fileduplicate01 and fileduplicate01 with the same schema 
and run the attached java to populate the data and run the query.

Any help is appreciated.


With Thanks & Regards
Karthik Ramachandran


***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**


Re: edismax parsing confusion

2017-04-04 Thread Greg Pendlebury
Try declaring your mm as 1 then and see if that assumption is correct.
Default 'mm' values are complicated to describe and depend on a variety of
factors. Generally if you want it to be a certain value, just declare it.

On 5 April 2017 at 02:07, Abhishek Mishra  wrote:

> Hello guys
> sorry for late response. @steve I am using solr 5.2 .
> @greg i am using default mm from config file(According to me it is default
> mm is 1).
>
> Regards,
> Abhishek
>
> On Tue, Apr 4, 2017 at 5:27 AM, Greg Pendlebury  >
> wrote:
>
> > eDismax uses 'mm', so knowing what that has been set to is important, or
> if
> > it has been left unset/default you would need to consider whether 'q.op'
> > has been set. Or the default operator from the config file.
> >
> > Ta,
> > Greg
> >
> >
> > On 3 April 2017 at 23:56, Steve Rowe  wrote:
> >
> > > Hi Abhishek,
> > >
> > > Which version of Solr are you using?
> > >
> > > I can see that the parsed queries are different, but they’re also very
> > > similar, and there’s a lot of detail there - can you be more specific
> > about
> > > what the problem is?
> > >
> > > --
> > > Steve
> > > www.lucidworks.com
> > >
> > > > On Apr 3, 2017, at 4:54 AM, Abhishek Mishra 
> > > wrote:
> > > >
> > > > Hi all
> > > > i am running solr query with these parameter
> > > >
> > > > bf: "sum(product(new_popularity,100),if(exists(third_price),50,0))"
> > > > qf: "test_product^5 category_path_tf^4 product_id gender"
> > > > q: "handbags between rs150 and rs 400"
> > > > defType: "edismax"
> > > >
> > > > parsed query is like below one
> > > >
> > > > for q:-
> > > > (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 |
> > gender:handbag |
> > > > test_product:handbag^5.0 | product_id:handbags))
> > > > DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
> > > > test_product:between^5.0 | product_id:between))
> > > > +DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
> > > > test_product:rs150^5.0 | product_id:rs150))
> > > > +DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
> > > > test_product:rs^5.0 | product_id:rs))
> > > > DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
> > > > test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"
> > > handbags
> > > > between rs150 ? rs 400")) (DisjunctionMaxQuery(("":"handbags
> > between"))
> > > > DisjunctionMaxQuery(("":"between rs150"))
> DisjunctionMaxQuery(("":"rs
> > > > 400"))) (DisjunctionMaxQuery(("":"handbags between rs150"))
> > > > DisjunctionMaxQuery(("":"between rs150"))
> > > DisjunctionMaxQuery(("":"rs150 ?
> > > > rs")) DisjunctionMaxQuery(("":"? rs 400")))
> > > > FunctionQuery(sum(product(float(new_popularity),const(
> > > 100)),if(exists(float(third_price)),const(50),const(0)/no_coord
> > > >
> > > > but for dismax parser it is working perfect:
> > > >
> > > > (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 |
> > gender:handbag |
> > > > test_product:handbag^5.0 | product_id:handbags))
> > > > DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
> > > > test_product:between^5.0 | product_id:between))
> > > > DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
> > > > test_product:rs150^5.0 | product_id:rs150))
> > > > DisjunctionMaxQuery((product_id:and))
> > > > DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
> > > > test_product:rs^5.0 | product_id:rs))
> > > > DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
> > > > test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"
> > > handbags
> > > > between rs150 ? rs 400"))
> > > > FunctionQuery(sum(product(float(new_popularity),const(
> > > 100)),if(exists(float(third_price)),const(50),const(0)/no_coord
> > > >
> > > >
> > > > *according to me difference between dismax and edismax is based on
> some
> > > > extra features plus working of boosting fucntions.*
> > > >
> > > >
> > > >
> > > > Regards,
> > > > Abhishek
> > >
> > >
> >
>


Re: Problems creating index for suggestions

2017-04-04 Thread Erick Erickson
Something's indeed not what I'd expect here. One note: buildOnCommit
will rebuild the suggester every time the index has a document
committed _anywhere_. So if there's any activity at all in terms of
indexing your suggester is being built. I.e. if you have your
autocommit interval set to 1 minute and are actively indexing, your
suggester gets rebuilt every minute.

But that's not your problem. How big is the index this suggester is
part of? You say 8 documents. Exclusive of the suggester parts of the
index, how big is the rest of your index on disk?

The suggester re-reads all of the stored values in your entire base
index for the field _sugerencia_ to build itself. So I'm guessing that
when you say the index is 8 documents it's not quite what you think it
is.

On the admin screen, what are numDocs and maxDocs for the index in question?

Best,
Erick

On Tue, Apr 4, 2017 at 2:11 PM, Alexis Aravena Silva
 wrote:
> Hi,
>
>
> I'm creating an index for suggestions, when I rebuild the index with 8 
> documents, Solr creates a temp file that consumes over 20GB in the process 
> and It takes more than 10 minutes in reindex, what is the problem?, It's 
> illogic that Solr takes so long and consumes such size of my disk:
>
>
>
> Filed Type Definition:
>
>
>  positionIncrementGap="100" multiValued="true">
>   
> 
>  words="stopwords.txt" />
>  maxGramSize="15" />
> 
>   
>   
> 
>   words="stopwords.txt" />
> 
>   
> 
>
>
> Suggester Configuration:
>
>
> 
> 
>   fuzzySuggester
>   FuzzyLookupFactory
>   fuzzy_suggestions
>   DocumentDictionaryFactory
>   _sugerencia_
>   idTipoRegistro
>   text_suggestion
>   false
>   true
> 
> 
>   infixSuggester
>   AnalyzingInfixLookupFactory
>   infix_suggestions
>   DocumentDictionaryFactory
>   _sugerencia_
>   idTipoRegistro
>   text_suggestion
>   false
>   true
> 
>   
>   
> 
>   true
>   infixSuggester
>   fuzzySuggester
>   true
>   10
>   true
> 
> 
>   suggest
> 
>   
>
>
>
> I rebuild the suggestions once by week, that's why I set buildOnCommit = true.
>
>
> Regards.


Re: Solr Shingle is not working properly in solr 6.5.0

2017-04-04 Thread Steve Rowe
Hi Aman,

I’ve created  for this 
problem.

--
Steve
www.lucidworks.com

> On Mar 31, 2017, at 7:34 AM, Aman Deep Singh  
> wrote:
> 
> Hi Rich,
> Query creation is correct only thing what causing the problem is that
> Boolean + query while building the lucene query which causing all tokens to
> be matched in the document (equivalent of mm=100%) even though I use mm=1
> it was using BOOLEAN + query as
> normal query one plus one abc
> Lucene query -
> +(((+nameShingle:one plus +nameShingle:plus one +nameShingle:one abc))
> ((+nameShingle:one plus +nameShingle:plus one abc)) ((+nameShingle:one plus
> one +nameShingle:one abc)) (nameShingle:one plus one abc))
> 
> Now since my doc contains only one plus one thus --
> one plus ,plus one, one plus one
> thus due to Boolean + it was not matching.
> Thanks,
> Aman Deep Singh
> 
> On Fri, Mar 31, 2017 at 4:41 PM Rick Leir  wrote:
> 
>> Hi Aman
>> Did you try the Admin Analysis tool? It will show you which filters are
>> effective at index and query time. It will help you understand why you are
>> not getting a mach.
>> Cheers -- Rick
>> 
>> On March 31, 2017 2:36:33 AM EDT, Aman Deep Singh <
>> amandeep.coo...@gmail.com> wrote:
>>> Hi,
>>> I was trying to use the shingle filter but it was not creating the
>>> query as
>>> desirable.
>>> 
>>> my schema is
>>> >> positionIncrementGap=
>>> "100">  
>>> >> class="solr.ShingleFilterFactory" outputUnigrams="false"
>>> maxShingleSize="4"
>>> />  
>>> 
>>> >> stored="true"/>
>>> 
>>> my solr query is
>>> 
>> http://localhost:8983/solr/productCollection/select?defType=edismax&debugQuery=true&q=one%20plus%20one%20four&qf=nameShingle&;
>>> *sow=false*&wt=xml
>>> 
>>> and it was creating the parsed query as
>>> 
>>> (+(DisjunctionMaxQuery(((+nameShingle:one plus +nameShingle:plus one
>>> +nameShingle:one four))) DisjunctionMaxQuery(((+nameShingle:one plus
>>> +nameShingle:plus one four))) DisjunctionMaxQuery(((+nameShingle:one
>>> plus
>>> one +nameShingle:one four))) DisjunctionMaxQuery((nameShingle:one plus
>>> one
>>> four)))~1)/no_coord
>>> 
>>> 
>>> *++nameShingle:one plus +nameShingle:plus one +nameShingle:one
>>> four))
>>> ((+nameShingle:one plus +nameShingle:plus one four)) ((+nameShingle:one
>>> plus one +nameShingle:one four)) (nameShingle:one plus one four))~1)*
>>> 
>>> 
>>> 
>>> So ideally token creations is perfect but in the query it is using
>>> boolean + operator which is causing the problem as if i have a document
>>> with name as
>>> "one plus one" ,according to the shingles it has to matched as its
>>> token
>>> will be  ("one plus","one plus one","plus one") .
>>> I have tried using the q.op and played around the mm also but nothing
>>> is
>>> giving me the correct response.
>>> Any idea how i can fetch that document even if the document is missing
>>> any
>>> token.
>>> 
>>> My expected response will be getting the document
>>> "one plus one" even the user query has any additional term like "one
>>> plus
>>> one two" and so on.
>>> 
>>> 
>>> Thanks,
>>> Aman Deep Singh
>> 
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.



Problems creating index for suggestions

2017-04-04 Thread Alexis Aravena Silva
Hi,


I'm creating an index for suggestions, when I rebuild the index with 8 
documents, Solr creates a temp file that consumes over 20GB in the process and 
It takes more than 10 minutes in reindex, what is the problem?, It's illogic 
that Solr takes so long and consumes such size of my disk:



Filed Type Definition:



  




  
  

 

  



Suggester Configuration:




  fuzzySuggester
  FuzzyLookupFactory
  fuzzy_suggestions
  DocumentDictionaryFactory
  _sugerencia_
  idTipoRegistro
  text_suggestion
  false
  true


  infixSuggester
  AnalyzingInfixLookupFactory
  infix_suggestions
  DocumentDictionaryFactory
  _sugerencia_
  idTipoRegistro
  text_suggestion
  false
  true

  
  

  true
  infixSuggester
  fuzzySuggester
  true
  10
  true


  suggest

  



I rebuild the suggestions once by week, that's why I set buildOnCommit = true.


Regards.


Re: Problem starting solr 6.5

2017-04-04 Thread wlee
Thanks.  I chmod 777 of the solr directory and I can start solr 6.5 now.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-starting-solr-6-5-tp4328227p4328373.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Fq and termfrequency are not showing the correct results

2017-04-04 Thread Erick Erickson
Functions like termfreq operate on single terms post analysis

Since it's an analyzed field you have no _term_ "bachelor's degree" or
even "bachelor degree" in the field. You have two terms, "bachelor"
and "degree". This also assumes that by "zero results" you mean you
get no frequency information back.

Best,
Erick

On Tue, Apr 4, 2017 at 2:19 AM, Ayush Gupta  wrote:
> Hi Everyone,
>
> I have a document that contains data like this "Bachelor's degree is easier
> to get" in the 'body' field and I am making a query on this field searching
> for word 'Bachelor's degree' like this -
> query?fq=body:"bachelor%27s%20degree"&fl=body_frequency:termfreq(body,"bachelor%27s%20degree"),body
> and I am getting zero results in response even when I have documents that
> contains words like 'Bachelor's degree'.
>
> I checked in the admin panel tab , there i can see the
> WordDelimiterFilterFactory applied on the word 'Bachelor's Degree' and
> converting it to 'Bachelor degree'. So both in the Field Value (Query) and
> Field Value (Index) the WordDelimiterFilterFactory is converting the word
> 'Bachelor's Degree' to 'Bachelor's Degree' , SO why I am getting zero
> results when quering. I have attached the screenshots of my analysis page.
>
>
> I have attached a code file 'code.txt' where you can see the code for the
> field 'body'.
>
>
> Please tell me what Am I doing wrong.
>
> Thanks
>
>
> CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents, files
> or previous e-mail messages attached to it, are confidential and may be
> privileged.  If you are not the intended recipient, or a person responsible
> for delivering it to the intended recipient, you are hereby notified that
> any review, disclosure, copying, distribution, retransmission, dissemination
> or other use of any of the information contained in, or attached to, this
> transmission is STRICTLY PROHIBITED.  If you have received this transmission
> in error, please immediately notify the sender.  Please destroy the original
> transmission and its attachments without reading or saving in any manner.
> Thank you.
>
> algoscale technologies private limited


Re: Phrase Fields performance

2017-04-04 Thread Erick Erickson
bq: ...and reducing the boost values to much smaller numbers...

not sure why that would matter for performance, muliplying is
multiplying, although reducing the boost on the default field might
have added up to a _lot_ of math ops.

Or is the boosting just a way to change the ranking to something you
can live with and not really a comment on performance?

I suspect the big difference is reducing the number of fields in "qf"
and my guess would be that the two fields omitted are larger text
fields.

FWIW,
Erick

On Tue, Apr 4, 2017 at 8:36 AM, David Hastings
 wrote:
> FYI, think i managed to get the results back and the speeds that i desired
> back reducing the number of fields in the qf/pf values from 6 to 4, also
> making sure to not boost the default field, and reducing the boost values
> to much smaller numbers but still significant enough to boost properly, so
> went from around .3 seconds pre qf/pf, above 1 sec after agressive
> settings, and now back down to around half a second with modified values,
> which I can live with.   also if anyone else like myself stores qtimes in a
> table this is a good 15 minute rolling average sql query you may or may not
> find useful:
>
>
> SELECT when_done as timestamp, AVG( qtime ), count(id)  FROM qtimes WHERE
>  `when_done` >=  '2017-03-23 09:00:00' AND `when_done` <=  '2017-03-23
> 13:00:00' GROUP BY year(when_done),month(when_done),day(when_done),( 4 *
> HOUR( when_done ) + FLOOR( MINUTE( when_done ) / 15 ))  ORDER BY
>  `qtimes`.`when_done` ASC;
>
>
>
>
>
> pre qf/pf values:
> | timestamp   | AVG( qtime ) | count(id) |
> +-+--+---+
> | 2017-03-23 09:00:00 | 322.0585 |   581 |
> | 2017-03-23 09:15:01 | 243.9634 |   628 |
> | 2017-03-23 09:30:00 | 347.1856 |   652 |
> | 2017-03-23 09:45:03 | 407.3195 |   673 |
> | 2017-03-23 10:00:02 | 307.1313 |   678 |
> | 2017-03-23 10:15:00 | 266.9802 |   759 |
> | 2017-03-23 10:30:01 | 288.1789 |   833 |
> | 2017-03-23 10:45:01 | 275.0880 |   852 |
> | 2017-03-23 11:00:02 | 417.0151 |   861 |
> | 2017-03-23 11:15:01 | 267.1153 |   945 |
> | 2017-03-23 11:30:00 | 387.1656 |   803 |
> | 2017-03-23 11:45:00 | 268.5137 |   837 |
> | 2017-03-23 12:00:00 | 294.5911 |   807 |
> | 2017-03-23 12:15:00 | 411.8617 |   752 |
> | 2017-03-23 12:30:00 | 478.3566 |   788 |
> | 2017-03-23 12:45:01 | 262.2294 |   680 |
>
>
>
> after pf/qf values but too agressive:
>
> | timestamp   | AVG( qtime ) | count(id) |
> +-+--+---+
> | 2017-04-03 09:00:04 |1002.1900 |   600 |
> | 2017-04-03 09:15:04 | 873.2367 |   659 |
> | 2017-04-03 09:30:00 |1013.9041 |   563 |
> | 2017-04-03 09:45:01 |1256.8596 |   591 |
> | 2017-04-03 10:00:08 |1092.8582 |   663 |
> | 2017-04-03 10:15:00 |1322.4262 |   671 |
> | 2017-04-03 10:30:06 | 848.1130 |   770 |
> | 2017-04-03 10:45:00 |1039.3202 |   887 |
> | 2017-04-03 11:00:00 |1144.9216 |   536 |
> | 2017-04-03 11:15:02 | 620.8999 |   719 |
> | 2017-04-03 11:30:03 | 999.7113 |   665 |
> | 2017-04-03 11:45:00 |1144.1348 |   564 |
> | 2017-04-03 12:00:01 |1317.7461 |   453 |
> | 2017-04-03 12:15:02 |1413.5864 |   573 |
> | 2017-04-03 12:30:02 | 746.9422 |   623 |
> | 2017-04-03 12:45:00 |1088.4789 |   568 |
>
>
> and finally modified pf/qf values changed at exactly 1046 am today:
>
>
> +-+--+---+
> | timestamp   | AVG( qtime ) | count(id) |
> +-+--+---+
> | 2017-04-04 09:00:00 |1079.3983 |   605 |
> | 2017-04-04 09:15:04 |1190.4540 |   544 |
> | 2017-04-04 09:30:00 |1459.6425 |   621 |
> | 2017-04-04 09:45:00 |2074.2777 |   677 |
> | 2017-04-04 10:00:01 |1555.0798 |   664 |
> | 2017-04-04 10:15:00 |1313.1793 |   697 |
> | 2017-04-04 10:30:00 |1042.4969 |   809 |
> | 2017-04-04 10:45:00 | 773.2043 |   695 |
> | 2017-04-04 11:00:00 | 526.7830 |   788 |
> | 2017-04-04 11:15:01 | 470.1969 |   711 |
> | 2017-04-04 11:30:02 | 642.1838 |   136 |
>
>
>
>
> On Sat, Apr 1, 2017 at 11:13 AM, Dave  wrote:
>
>> Maybe commongrams could help this but it boils down to
>> speed/quality/cheap. Choose two. Thanks
>>
>> > On Apr 1, 2017, at 10:28 AM, Shawn Heisey  wrote:
>> >
>> >> On 3/31/2017 1:55 PM, David Hastings wrote:
>> >> So I un-commented out the line, to enable it to go against 6 important
>> >> fields. Afterwards through monitoring performance I noticed that my
>> >> searches were taking roughly 50% to 100% (2x!) longer, and it started
>> >> at the exact time I committed that change, 1:40 pm, qtimes below in a
>> >> 15 minute average cycle with the start time listed.
>> >
>> > That is fully expected.  Using bo

Re:solr learning_to_rank (normalizer) unmatched argument type issue

2017-04-04 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Jianxiong,

Thanks for reporting this. I think this is a bug and have filed 
https://issues.apache.org/jira/browse/SOLR-10421 ticket for fixing it.

Regards,
Christine

- Original Message -
From: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
At: 03/31/17 23:19:27

Hi,
I created a toy learning-to-rank model in solr in order to show the issues.

Feature.json
-
[
  {
"store" : "wikiFeatureStore",
"name" : "doc_len",
"class" : "org.apache.solr.ltr.feature.FieldLengthFeature",
"params" : {"field":"a_text"}
  },
  {
"store" : "wikiFeatureStore",
"name" : "rankScore",
"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
"params" : {}
  }
]

model.json
---
{
  "store" : "wikiFeatureStore",
  "class" : "org.apache.solr.ltr.model.LinearModel",
  "name" : "wiki_qaModel",
  "features" : [
{ "name" : "doc_len",
  "norm" : {
  "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer",
  "params" : {"min": "1.0", "max" : "113.8" }
  }
},
   { "name" : "rankScore",
  "norm" : {
  "class" : "org.apache.solr.ltr.norm.MinMaxNormalizer",
  "params" : {"min": "0.0", "max" : "49.60385" }
  }
}
   ],
  "params" : {
  "weights": {
   "doc_len": 0.322,
   "rankScore": 0.98
  }
   }
}

I could upload both feature and model  and performed re-ranking based
on the above model.   The issue was that when I stopped the solr
server and restarted it.
I got error message when I ran the same query to extract the features:
"Caused by: org.apache.solr.common.SolrException: Failed to create new
ManagedResource /schema/model-store of type
org.apache.solr.ltr.store.rest.ManagedModelStore due to:
java.lang.IllegalArgumentException: argument type mismatch
at 
org.apache.solr.rest.RestManager.createManagedResource(RestManager.java:700)
at 
org.apache.solr.rest.RestManager.addRegisteredResource(RestManager.java:666)
at org.apache.solr.rest.RestManager.access$300(RestManager.java:59)
at 
org.apache.solr.rest.RestManager$Registry.registerManagedResource(RestManager.java:231)
at 
org.apache.solr.ltr.store.rest.ManagedModelStore.registerManagedModelStore(ManagedModelStore.java:51)
at 
org.apache.solr.ltr.search.LTRQParserPlugin.inform(LTRQParserPlugin.java:124)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:719)
at org.apache.solr.core.SolrCore.(SolrCore.java:931)
... 9 more
Caused by: java.lang.IllegalArgumentException: argument type mismatch
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.solr.util.SolrPluginUtils.invokeSetters(SolrPluginUtils.java:1077)
at org.apache.solr.ltr.norm.Normalizer.getInstance(Normalizer.java:49)
"

I found that the issue was related to
solr-6.4.2/server/solr/my_collection/conf/_schema_model-store.json
"
{
  "initArgs":{},
  "initializedOn":"2017-03-31T20:51:59.494Z",
  "updatedSinceInit":"2017-03-31T20:54:54.841Z",
  "managedList":[{
  "name":"wiki_qaModel",
  "class":"org.apache.solr.ltr.model.LinearModel",
  "store":"wikiFeatureStore",
  "features":[
{
  "name":"doc_len",
  "norm":{
"class":"org.apache.solr.ltr.norm.MinMaxNormalizer",
"params":{
  "min":1.0,
  "max":113.7862548828}}},
...
"

Here the data type  for "min'' and "max" are double. When I manually
changed them to string. Then everything worked as expected.

"
 "norm":{
"class":"org.apache.solr.ltr.norm.MinMaxNormalizer",
"params":{
  "min": "1.0",
  "max": "113.7862548828"}}},


Any insights into the above strange behavior?

Thanks

Jianxiong



Problem with multi-valued field using Solr CEL

2017-04-04 Thread Charlie Hubbard
So I'm trying to index documents using Solr CEL and Tika on Solr 5.4.1.
I'm using the default configuration, but when I import my docs I'm getting
this error:

125973 INFO  (qtp840863278-17) [   x:fusearchiver] o.a.s.c.PluginBag Going
to create a new requestHandler with {type = requestHandler,name =
/update/extract,class = solr.extraction.ExtractingRequestHandler,args =
{defaults={lowernames=true,uprefix=ignored_,captureAttr=true,fmap.a=links,fmap.div=ignored_}}}

127134 INFO  (qtp840863278-17) [   x:fusearchiver]
o.a.s.u.p.LogUpdateProcessorFactory [fusearchiver] webapp=/solr
path=/update/extract
params={literal.archiveDate_dt=Mon+Apr+03+21:16:48+EDT+2017&literal._accountId=2&literal.categories=taxes&literal.categories=5498&
literal.id=b5701a36-0dec-4746-bb5d-3c307a557cd7&literal._batchId=25&literal._type=document&literal._filename=2016-0664-Form-5498.pdf&literal._employeeNumber=1411&wt=javabin&literal._employeeFuseId=1&literal.effectiveDate_dt=Sat+Dec+31+00:00:00+EST+2016&literal._json={"accountId":2,"archiveDate":1491268608431,"batchId":25,"categories":["taxes","5498"],"effectiveDate":148316040,"employeeFuseId":1,"employeeNumber":"1411","fileName":"2016-0664-Form-5498.pdf","id":"b5701a36-0dec-4746-bb5d-3c307a557cd7","imageUrl":null,"path":"2016-0664-Form-5498.pdf","uploadedBy":null,"url":null}&version=2}
{} 0 1161

127135 ERROR (qtp840863278-17) [   x:fusearchiver]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: ERROR:
[doc=b5701a36-0dec-4746-bb5d-3c307a557cd7] multiple values encountered for
non multiValued field meta: [dcterms:modified, 2017-03-16T23:14:41Z,
meta:creation-date, 2017-03-16T23:14:41Z, meta:save-date,
2017-03-16T23:14:41Z, pdf:PDFVersion, 1.4, dcterms:created,
2017-03-16T23:14:41Z, Last-Modified, 2017-03-16T23:14:41Z, date,
2017-03-16T23:14:41Z, X-Parsed-By, org.apache.tika.parser.DefaultParser,
X-Parsed-By, org.apache.tika.parser.pdf.PDFParser, modified,
2017-03-16T23:14:41Z, xmpTPg:NPages, 2, Creation-Date,
2017-03-16T23:14:41Z, pdf:encrypted, false, created, Thu Mar 16 23:14:41
UTC 2017, stream_size, null, dc:format, application/pdf; version=1.4,
producer, Ricoh Americas Corporation, AFP2PDF, Content-Type,
application/pdf, xmp:CreatorTool, Ricoh Americas Corporation, AFP2PDF Plus
Version: 1.014.10, Last-Save-Date, 2017-03-16T23:14:41Z]

at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:92)

at
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:83)

at
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:273)

at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:207)

at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)

at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)

at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:49)

at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:924)

at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1079)

at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:702)

at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)

at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:126)

at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:131)

at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:237)

at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:70)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)

Here is my solrconfig.xml of the extract module:


  
true
ignored_


true
links
ignored_
  


I thought this would basically mark everything that wasn't a field as
ignored so meta shouldn't be imported.  I've searched through my solr
schema, and I have no meta field declared hence I thought CEL would throw
it out.

I'm using Solrj to import the docs.  I'm also adding a lot of literals to
the document.  You can see above the data that I'm providing in literals.

Why am I seeing this error?

Can I simply have it only extract the information and I'll put it in a text
field and have it process the HTML in the same manner to work around this
issue?

TIA
Charlie


Re: edismax parsing confusion

2017-04-04 Thread Abhishek Mishra
Hello guys
sorry for late response. @steve I am using solr 5.2 .
@greg i am using default mm from config file(According to me it is default
mm is 1).

Regards,
Abhishek

On Tue, Apr 4, 2017 at 5:27 AM, Greg Pendlebury 
wrote:

> eDismax uses 'mm', so knowing what that has been set to is important, or if
> it has been left unset/default you would need to consider whether 'q.op'
> has been set. Or the default operator from the config file.
>
> Ta,
> Greg
>
>
> On 3 April 2017 at 23:56, Steve Rowe  wrote:
>
> > Hi Abhishek,
> >
> > Which version of Solr are you using?
> >
> > I can see that the parsed queries are different, but they’re also very
> > similar, and there’s a lot of detail there - can you be more specific
> about
> > what the problem is?
> >
> > --
> > Steve
> > www.lucidworks.com
> >
> > > On Apr 3, 2017, at 4:54 AM, Abhishek Mishra 
> > wrote:
> > >
> > > Hi all
> > > i am running solr query with these parameter
> > >
> > > bf: "sum(product(new_popularity,100),if(exists(third_price),50,0))"
> > > qf: "test_product^5 category_path_tf^4 product_id gender"
> > > q: "handbags between rs150 and rs 400"
> > > defType: "edismax"
> > >
> > > parsed query is like below one
> > >
> > > for q:-
> > > (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 |
> gender:handbag |
> > > test_product:handbag^5.0 | product_id:handbags))
> > > DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
> > > test_product:between^5.0 | product_id:between))
> > > +DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
> > > test_product:rs150^5.0 | product_id:rs150))
> > > +DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
> > > test_product:rs^5.0 | product_id:rs))
> > > DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
> > > test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"
> > handbags
> > > between rs150 ? rs 400")) (DisjunctionMaxQuery(("":"handbags
> between"))
> > > DisjunctionMaxQuery(("":"between rs150")) DisjunctionMaxQuery(("":"rs
> > > 400"))) (DisjunctionMaxQuery(("":"handbags between rs150"))
> > > DisjunctionMaxQuery(("":"between rs150"))
> > DisjunctionMaxQuery(("":"rs150 ?
> > > rs")) DisjunctionMaxQuery(("":"? rs 400")))
> > > FunctionQuery(sum(product(float(new_popularity),const(
> > 100)),if(exists(float(third_price)),const(50),const(0)/no_coord
> > >
> > > but for dismax parser it is working perfect:
> > >
> > > (+(DisjunctionMaxQuery((category_path_tf:handbags^4.0 |
> gender:handbag |
> > > test_product:handbag^5.0 | product_id:handbags))
> > > DisjunctionMaxQuery((category_path_tf:between^4.0 | gender:between |
> > > test_product:between^5.0 | product_id:between))
> > > DisjunctionMaxQuery((category_path_tf:rs150^4.0 | gender:rs150 |
> > > test_product:rs150^5.0 | product_id:rs150))
> > > DisjunctionMaxQuery((product_id:and))
> > > DisjunctionMaxQuery((category_path_tf:rs^4.0 | gender:rs |
> > > test_product:rs^5.0 | product_id:rs))
> > > DisjunctionMaxQuery((category_path_tf:400^4.0 | gender:400 |
> > > test_product:400^5.0 | product_id:400))) DisjunctionMaxQuery(("":"
> > handbags
> > > between rs150 ? rs 400"))
> > > FunctionQuery(sum(product(float(new_popularity),const(
> > 100)),if(exists(float(third_price)),const(50),const(0)/no_coord
> > >
> > >
> > > *according to me difference between dismax and edismax is based on some
> > > extra features plus working of boosting fucntions.*
> > >
> > >
> > >
> > > Regards,
> > > Abhishek
> >
> >
>


Re: Phrase Fields performance

2017-04-04 Thread David Hastings
FYI, think i managed to get the results back and the speeds that i desired
back reducing the number of fields in the qf/pf values from 6 to 4, also
making sure to not boost the default field, and reducing the boost values
to much smaller numbers but still significant enough to boost properly, so
went from around .3 seconds pre qf/pf, above 1 sec after agressive
settings, and now back down to around half a second with modified values,
which I can live with.   also if anyone else like myself stores qtimes in a
table this is a good 15 minute rolling average sql query you may or may not
find useful:


SELECT when_done as timestamp, AVG( qtime ), count(id)  FROM qtimes WHERE
 `when_done` >=  '2017-03-23 09:00:00' AND `when_done` <=  '2017-03-23
13:00:00' GROUP BY year(when_done),month(when_done),day(when_done),( 4 *
HOUR( when_done ) + FLOOR( MINUTE( when_done ) / 15 ))  ORDER BY
 `qtimes`.`when_done` ASC;





pre qf/pf values:
| timestamp   | AVG( qtime ) | count(id) |
+-+--+---+
| 2017-03-23 09:00:00 | 322.0585 |   581 |
| 2017-03-23 09:15:01 | 243.9634 |   628 |
| 2017-03-23 09:30:00 | 347.1856 |   652 |
| 2017-03-23 09:45:03 | 407.3195 |   673 |
| 2017-03-23 10:00:02 | 307.1313 |   678 |
| 2017-03-23 10:15:00 | 266.9802 |   759 |
| 2017-03-23 10:30:01 | 288.1789 |   833 |
| 2017-03-23 10:45:01 | 275.0880 |   852 |
| 2017-03-23 11:00:02 | 417.0151 |   861 |
| 2017-03-23 11:15:01 | 267.1153 |   945 |
| 2017-03-23 11:30:00 | 387.1656 |   803 |
| 2017-03-23 11:45:00 | 268.5137 |   837 |
| 2017-03-23 12:00:00 | 294.5911 |   807 |
| 2017-03-23 12:15:00 | 411.8617 |   752 |
| 2017-03-23 12:30:00 | 478.3566 |   788 |
| 2017-03-23 12:45:01 | 262.2294 |   680 |



after pf/qf values but too agressive:

| timestamp   | AVG( qtime ) | count(id) |
+-+--+---+
| 2017-04-03 09:00:04 |1002.1900 |   600 |
| 2017-04-03 09:15:04 | 873.2367 |   659 |
| 2017-04-03 09:30:00 |1013.9041 |   563 |
| 2017-04-03 09:45:01 |1256.8596 |   591 |
| 2017-04-03 10:00:08 |1092.8582 |   663 |
| 2017-04-03 10:15:00 |1322.4262 |   671 |
| 2017-04-03 10:30:06 | 848.1130 |   770 |
| 2017-04-03 10:45:00 |1039.3202 |   887 |
| 2017-04-03 11:00:00 |1144.9216 |   536 |
| 2017-04-03 11:15:02 | 620.8999 |   719 |
| 2017-04-03 11:30:03 | 999.7113 |   665 |
| 2017-04-03 11:45:00 |1144.1348 |   564 |
| 2017-04-03 12:00:01 |1317.7461 |   453 |
| 2017-04-03 12:15:02 |1413.5864 |   573 |
| 2017-04-03 12:30:02 | 746.9422 |   623 |
| 2017-04-03 12:45:00 |1088.4789 |   568 |


and finally modified pf/qf values changed at exactly 1046 am today:


+-+--+---+
| timestamp   | AVG( qtime ) | count(id) |
+-+--+---+
| 2017-04-04 09:00:00 |1079.3983 |   605 |
| 2017-04-04 09:15:04 |1190.4540 |   544 |
| 2017-04-04 09:30:00 |1459.6425 |   621 |
| 2017-04-04 09:45:00 |2074.2777 |   677 |
| 2017-04-04 10:00:01 |1555.0798 |   664 |
| 2017-04-04 10:15:00 |1313.1793 |   697 |
| 2017-04-04 10:30:00 |1042.4969 |   809 |
| 2017-04-04 10:45:00 | 773.2043 |   695 |
| 2017-04-04 11:00:00 | 526.7830 |   788 |
| 2017-04-04 11:15:01 | 470.1969 |   711 |
| 2017-04-04 11:30:02 | 642.1838 |   136 |




On Sat, Apr 1, 2017 at 11:13 AM, Dave  wrote:

> Maybe commongrams could help this but it boils down to
> speed/quality/cheap. Choose two. Thanks
>
> > On Apr 1, 2017, at 10:28 AM, Shawn Heisey  wrote:
> >
> >> On 3/31/2017 1:55 PM, David Hastings wrote:
> >> So I un-commented out the line, to enable it to go against 6 important
> >> fields. Afterwards through monitoring performance I noticed that my
> >> searches were taking roughly 50% to 100% (2x!) longer, and it started
> >> at the exact time I committed that change, 1:40 pm, qtimes below in a
> >> 15 minute average cycle with the start time listed.
> >
> > That is fully expected.  Using both pf and qf basically has Solr doing
> > the exact same queries twice, once as specified on fields in qf, then
> > again as a phrase query on fields in pf.  If you add pf2 and/or pf3, you
> > can expect further speed drops.
> >
> > If you're sorting by relevancy, using pf with higher boosts than qf
> > generally will make your results better, but it comes at a cost in
> > performance.
> >
> > Thanks,
> > Shawn
> >
>


Re: How to Apply 'implicit' routing in exist collection in solr 6.1.0

2017-04-04 Thread Anshum Gupta
Hi Ketan,

I just want to be sure about your understanding of the 'implicit' router.

Implicit router in Solr puts the onus of correctly routing the documents on
the user, instead of 'implicitly' or automatically routing them.

-Anshum

On Tue, Apr 4, 2017 at 2:01 AM Ketan Thanki  wrote:

>
> Hi,
>
> Need the help for how to apply 'implicit' routing in existing collections.
> e.g :  I have configure the 2 collections with each has 4 shard and 4
> replica so what changes should i
> do for apply ' implicit' routing.
>
> Please  do needful with some examples.
>
> Regards,
> Ketan.
>
> [CC Award Winners!]
>
>


Re: Using function queries for faceting

2017-04-04 Thread Mikhail Khludnev
Exclude users' products, calculate default price facet, then facet only
user's products (in a main query) and sum facet counts. It's probably can
be done with switching domains in json facets.

On Tue, Apr 4, 2017 at 5:43 PM, Georg Sorst  wrote:

> Hi Mikhail,
>
> copying the default field was my first attempt as well - however, the
> system in total has over 50.000 users which may have an individual price on
> every product (even though they usually don't). Still, with the copying
> approach this results in every document having 50.000 price fields. Solr
> completely chokes trying to import this data.
>
> Best,
> Georg
>
> Mikhail Khludnev  schrieb am Di., 4. Apr. 2017 um
> 15:28 Uhr:
>
> > Hello Georg,
> > You can probably use {!frange} and  and a few facet.query enumerating
> price
> > ranges, but probably it's easier to just copy default price across all
> > empty price groups in index time.
> >
> >
> > On Tue, Apr 4, 2017 at 1:14 PM, Georg Sorst 
> wrote:
> >
> > > Hi list!
> > >
> > > My documents are eCommerce items. They may have a special price for a
> > > certain group of users, but not for other groups of users; in that case
> > the
> > > default price should be used. So the documents look like something like
> > > this:
> > >
> > > item:
> > >   id: 1
> > >   price_default: 11.5
> > >   price_group1: 11.2
> > > item:
> > >   id: 2
> > >   price_default: 12.3
> > >   price_group2: 12.5
> > >
> > > Now when I want to fetch the documents and display the correct price
> for
> > > group1 I can use 'fl=def(price_group1,price_default)'. Works like a
> > charm!
> > > It will return price_group1 for document 1 and price_default for
> document
> > > 2.
> > >
> > > Is there a way to do this for faceting as well? I've unsuccessfully
> > tried:
> > >
> > > * facet.field=def(price_group1,price_default)
> > > * facet.field=effective_price:def(price_group1,price_default)
> > > * facet.field={!func}def(price_group1,price_default)
> > > * facet.field={!func}effective_price:def(price_group1,price_default)
> > > * json.facet={price:"def(price_group1,price_default)"}
> > >
> > > I'm fine with either the "old" facet API or the JSON facets.Any ideas?
> > >
> > > Thanks!
> > > Georg
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Using function queries for faceting

2017-04-04 Thread Georg Sorst
Hi Mikhail,

copying the default field was my first attempt as well - however, the
system in total has over 50.000 users which may have an individual price on
every product (even though they usually don't). Still, with the copying
approach this results in every document having 50.000 price fields. Solr
completely chokes trying to import this data.

Best,
Georg

Mikhail Khludnev  schrieb am Di., 4. Apr. 2017 um
15:28 Uhr:

> Hello Georg,
> You can probably use {!frange} and  and a few facet.query enumerating price
> ranges, but probably it's easier to just copy default price across all
> empty price groups in index time.
>
>
> On Tue, Apr 4, 2017 at 1:14 PM, Georg Sorst  wrote:
>
> > Hi list!
> >
> > My documents are eCommerce items. They may have a special price for a
> > certain group of users, but not for other groups of users; in that case
> the
> > default price should be used. So the documents look like something like
> > this:
> >
> > item:
> >   id: 1
> >   price_default: 11.5
> >   price_group1: 11.2
> > item:
> >   id: 2
> >   price_default: 12.3
> >   price_group2: 12.5
> >
> > Now when I want to fetch the documents and display the correct price for
> > group1 I can use 'fl=def(price_group1,price_default)'. Works like a
> charm!
> > It will return price_group1 for document 1 and price_default for document
> > 2.
> >
> > Is there a way to do this for faceting as well? I've unsuccessfully
> tried:
> >
> > * facet.field=def(price_group1,price_default)
> > * facet.field=effective_price:def(price_group1,price_default)
> > * facet.field={!func}def(price_group1,price_default)
> > * facet.field={!func}effective_price:def(price_group1,price_default)
> > * json.facet={price:"def(price_group1,price_default)"}
> >
> > I'm fine with either the "old" facet API or the JSON facets.Any ideas?
> >
> > Thanks!
> > Georg
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


RE: Solr 6.x leaking one SolrZkClient instance per second?

2017-04-04 Thread Markus Jelsma
Opened: https://issues.apache.org/jira/browse/SOLR-10420

Thanks,
Markus

 
 
-Original message-
> From:Shalin Shekhar Mangar 
> Sent: Tuesday 4th April 2017 16:11
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 6.x leaking one SolrZkClient instance per second?
> 
> Please open a Jira issue. Thanks!
> 
> On Tue, Apr 4, 2017 at 7:16 PM, Markus Jelsma
>  wrote:
> > Hi,
> >
> > One of our nodes became berzerk after a restart, Solr went completely nuts! 
> > So i opened VisualVM to keep an eye on it and spotted a different problem 
> > that occurs in all our Solr 6.4.2 and 6.5.0 nodes.
> >
> > It appears Solr is leaking one SolrZkClient instance per second via 
> > DistributedQueue$ChildWatcher. That one per second is quite accurate for 
> > all nodes, there are about the same amount of instances as there are 
> > seconds since Solr started. I know VisualVM's instance count includes 
> > objects-to-be-collected, the instance count does not drop after a forced 
> > garbed collection round.
> >
> > It doesn't matter how many cores or collections the nodes carry or how 
> > heavy traffic is.
> >
> > Is this a known issue? Ticket?
> >
> > Thanks,
> > Markus
> 
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 


Re: Solr 6.x leaking one SolrZkClient instance per second?

2017-04-04 Thread Shalin Shekhar Mangar
Please open a Jira issue. Thanks!

On Tue, Apr 4, 2017 at 7:16 PM, Markus Jelsma
 wrote:
> Hi,
>
> One of our nodes became berzerk after a restart, Solr went completely nuts! 
> So i opened VisualVM to keep an eye on it and spotted a different problem 
> that occurs in all our Solr 6.4.2 and 6.5.0 nodes.
>
> It appears Solr is leaking one SolrZkClient instance per second via 
> DistributedQueue$ChildWatcher. That one per second is quite accurate for all 
> nodes, there are about the same amount of instances as there are seconds 
> since Solr started. I know VisualVM's instance count includes 
> objects-to-be-collected, the instance count does not drop after a forced 
> garbed collection round.
>
> It doesn't matter how many cores or collections the nodes carry or how heavy 
> traffic is.
>
> Is this a known issue? Ticket?
>
> Thanks,
> Markus



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr Cloud 6.5.0 Replicas go down while indexing

2017-04-04 Thread Shawn Heisey
On 4/3/2017 7:52 AM, Salih Sen wrote:
> We have a three server set up with each server having 756G ram, 48
> cores, 4SSDs (each having tree solr instances on them) and a dedicated
> mechanical disk for zookeeper (3 zk instances total). Each Solr
> instances have 31G of heap space allocated to them. In total we have
> 36 Solr Instances and 3 Zookeeper instances (with 1G heapspace). Also
> servers 10Gig network between them.

You haven't described your index(es).  How many collections in the
cloud?  How many shards for each?  How many replicas for each shard? 
How many docs in each collection?  How much *total* index data is on
each of those systems?  To determine this, add up the size of the solr
home in all of the Solr instances that exist on that server.  With this
information, we can make an educated guess about whether the setup you
have engineered is reasonably correct for the scale of your data.

It sounds like you have twelve Solr instances per server, with each one
using a 31GB heap.  That's 372GB of memory JUST for Solr heaps.  Unless
you're dealing with terabytes of index data and hundreds of millions (or
billions) of documents, I cannot imagine needing that many Solr
instances per server or that much heap memory.

Have you increased the maximum number of processes that the user which
is running Solr can have?  12 instances of Solr is going to be a LOT of
threads, and on most operating systems, each thread counts against the
user process limit.  Some operating systems might have a separate
configuration for thread limits, but I do know that Linux does not, and
counts them as processes.

> We set Auto hardcommit time to 15sec and 1 docs, and soft commit
> to 6 sec and 5000 seconds in order to avoid soft committing too
> much and avoiding indexing bottlenecks. We also
> set DzkClientTimeout=9.

Side issue: It's generally preferable to only use either maxDoc or
maxTime, and maxTime will usually result in more predictable behavior,
so I recommend removing the maxDoc settings on autoCommit and
autoSoftCommit.  I doubt this will have any effect on the problem you're
experiencing, just something I noticed.  I recommend a maxTime of 6
(one minute) for autoCommit, with openSearcher set to false, and a
maxTime of at least 12 (two minutes) for autoSoftCommit.  If these
seem excessively high to you, go with 3 and 6.

On zkClientTimeout, unless you have increased the ZK server tickTime,
you'll find that you can't actually define a zkClientTimeout that high. 
The maximum is 20*tickTime.  A typical tickTime value is 2000, which
means that the usual maximum value for zkClientTimeout is 40 seconds. 
The error you've reported doesn't look related to zkClientTimeout, so
increasing that beyond 30 seconds is probably unnecessary.  The default
values for Zookeeper server tuning have been worked on by the ZK
developers for years.  I wouldn't mess with tickTime without a REALLY
good reason.

Another side issue: Putting Zookeeper data on a mechanical disk when
there are SSDs available seems like a mistake to me.  Zookeeper is even
more sensitive to disk performance than Solr is.

> But it seems replicas still randomly go down while indexing. Do you
> have any suggestions to prevent this situation?

> Caused by: java.net.SocketTimeoutException: Read timed out

This error says that a TCP connection (http on port 9132) from one Solr
server to another hit the socket timeout -- there was no activity on the
connection for whatever the timeout is set to.  Usually a problem like
this has two causes:

1) A *serious* performance issue with Solr resulting in an incredibly
long processing time.  Most performance issues are memory-related.
2) The socket timeout has been set to a very low value.

In a later message on the thread, you indicated that the configured
socket timeout is ten minutes.  This should be plenty, and makes me
think option number one above is what we are dealing with, and the
information I asked for in the first paragraph of this reply is required
for any deeper insight.

Are there other errors in the Solr logfile that you haven't included? 
It seems likely that this is not the only problem Solr has encountered.

Thanks,
Shawn



Re: Solr Cloud 6.5.0 Replicas go down while indexing

2017-04-04 Thread Michael Joyner
Try Increasing the number of connections your ZooKeeper allows to a very 
large number.



On 04/04/2017 09:02 AM, Salih Sen wrote:

Hi,

One of the replicas went down again today somehow disabling all 
updates to cluster with error message "Cannot talk to ZooKeeper - 
Updates are disabled.” half an hour.


ZK Leader was on the same server with Solr instance so I doubt it has 
anything to do with network (at least between Solr and ZK leader 
node), restarting the ZK leader seems to resolve the issue and cluster 
accepting updates again.



== Solr Node
WARN  - 2017-04-04 11:49:14.414; [   ] 
org.apache.solr.common.cloud.ConnectionManager; Watcher 
org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name: 
ZooKeeperConnection Watcher:192.168.30.32:2181 
,192.168.30.33:2181 
,192.168.30.24:2181 
 got event WatchedEvent state:Disconnected 
type:None path:null path: null type: None
WARN  - 2017-04-04 11:49:15.723; [   ] 
org.apache.solr.common.cloud.ConnectionManager; zkClient has disconnected
WARN  - 2017-04-04 11:49:15.727; [   ] 
org.apache.solr.common.cloud.ConnectionManager; Watcher 
org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name: 
ZooKeeperConnection Watcher:192.168.30.32:2181 
,192.168.30.33:2181 
,192.168.30.24:2181 
 got event WatchedEvent state:Expired 
type:None path:null path: null type: None
WARN  - 2017-04-04 11:49:15.727; [   ] 
org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper 
session was expired. Attempting to reconnect to recover relationship 
with ZooKeeper...
WARN  - 2017-04-04 11:49:15.728; [   ] 
org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection 
expired - starting a new one...
ERROR - 2017-04-04 11:49:22.040; [c:doc s:shard6 r:core_node27 
x:doc_shard6_replica1] org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
Updates are disabled.
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1739)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:703)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handl

Re: How to Apply 'implicit' routing in exist collection in solr 6.1.0

2017-04-04 Thread Shawn Heisey
On 4/4/2017 3:00 AM, Ketan Thanki wrote:
> Need the help for how to apply 'implicit' routing in existing
> collections. e.g : I have configure the 2 collections with each has 4
> shard and 4 replica so what changes should i do for apply ' implicit'
> routing.

Make a new collection.  Or delete the collection and build it again with
different settings.

Changing the router on a collection is possible if you manually edit the
information in zookeeper, but there is no API to change it, because once
it is set and documents are indexed, changing it would break what
already exists.

Thanks,
Shawn



Solr 6.x leaking one SolrZkClient instance per second?

2017-04-04 Thread Markus Jelsma
Hi,

One of our nodes became berzerk after a restart, Solr went completely nuts! So 
i opened VisualVM to keep an eye on it and spotted a different problem that 
occurs in all our Solr 6.4.2 and 6.5.0 nodes.

It appears Solr is leaking one SolrZkClient instance per second via 
DistributedQueue$ChildWatcher. That one per second is quite accurate for all 
nodes, there are about the same amount of instances as there are seconds since 
Solr started. I know VisualVM's instance count includes 
objects-to-be-collected, the instance count does not drop after a forced garbed 
collection round.

It doesn't matter how many cores or collections the nodes carry or how heavy 
traffic is.

Is this a known issue? Ticket?

Thanks,
Markus


Implementing DIH - Using a non-datetime change tracking column to Identify delta

2017-04-04 Thread subinalex
Hi Experts,

Can we use a non-datetime column to identify delta rows in deltaQuery for
DIH configuration.
Like for example in the below deltaQuery ,

  deltaQuery="select ID from category where last_modified >
'${dih.last_index_time}'"


the delta rows are picked when the last_modified datetime is greater than
last index time.

I want to pick the deltas if a column value differs from the corresponding
column value in solr.

 deltaQuery="select ID from category where md5hashcode  <> ;
'indexedmd5hashcode'"



Can we implement this?.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-DIH-Using-a-non-datetime-change-tracking-column-to-Identify-delta-tp4328306.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using function queries for faceting

2017-04-04 Thread Mikhail Khludnev
Hello Georg,
You can probably use {!frange} and  and a few facet.query enumerating price
ranges, but probably it's easier to just copy default price across all
empty price groups in index time.


On Tue, Apr 4, 2017 at 1:14 PM, Georg Sorst  wrote:

> Hi list!
>
> My documents are eCommerce items. They may have a special price for a
> certain group of users, but not for other groups of users; in that case the
> default price should be used. So the documents look like something like
> this:
>
> item:
>   id: 1
>   price_default: 11.5
>   price_group1: 11.2
> item:
>   id: 2
>   price_default: 12.3
>   price_group2: 12.5
>
> Now when I want to fetch the documents and display the correct price for
> group1 I can use 'fl=def(price_group1,price_default)'. Works like a charm!
> It will return price_group1 for document 1 and price_default for document
> 2.
>
> Is there a way to do this for faceting as well? I've unsuccessfully tried:
>
> * facet.field=def(price_group1,price_default)
> * facet.field=effective_price:def(price_group1,price_default)
> * facet.field={!func}def(price_group1,price_default)
> * facet.field={!func}effective_price:def(price_group1,price_default)
> * json.facet={price:"def(price_group1,price_default)"}
>
> I'm fine with either the "old" facet API or the JSON facets.Any ideas?
>
> Thanks!
> Georg
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Using Tesseract OCR to extract PDF files in EML file attachment

2017-04-04 Thread AJ Weber
You'll need to use something like javax mail (or some of the jars that 
have been built on top of it for higher-level access) to open the EML 
files and extract the attachments, then operate on the extracted 
attachments as you would any file.


There are alternative, paid, libraries to parse and extract attachments 
from EML files as well.


EML attachments will have a mimetype associated with their metadata.



On 4/4/2017 2:00 AM, Zheng Lin Edwin Yeo wrote:

Hi,

Currently, I am able to extract scanned PDF images and index them to Solr
using Tesseract OCR, although the speed is very slow.

However, for EML files with PDF attachments that consist of scanned images,
the Tesseract OCR is not able to extract the text from those PDF
attachments.

Can we use the same method for EML files? Or what are the suggestions that
we can do to extract those attachments?

I'm using Solr 6.5.0

Regards,
Edwin





Re: Solr Cloud 6.5.0 Replicas go down while indexing

2017-04-04 Thread Salih Sen
Hi,

One of the replicas went down again today somehow disabling all updates to
cluster with error message "Cannot talk to ZooKeeper - Updates are
disabled.” half an hour.

ZK Leader was on the same server with Solr instance so I doubt it has
anything to do with network (at least between Solr and ZK leader node),
restarting the ZK leader seems to resolve the issue and cluster accepting
updates again.


== Solr Node
WARN  - 2017-04-04 11:49:14.414; [   ]
org.apache.solr.common.cloud.ConnectionManager; Watcher
org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name:
ZooKeeperConnection Watcher:192.168.30.32:2181,192.168.30.33:2181,
192.168.30.24:2181 got event WatchedEvent state:Disconnected type:None
path:null path: null type: None
WARN  - 2017-04-04 11:49:15.723; [   ]
org.apache.solr.common.cloud.ConnectionManager; zkClient has disconnected
WARN  - 2017-04-04 11:49:15.727; [   ]
org.apache.solr.common.cloud.ConnectionManager; Watcher
org.apache.solr.common.cloud.ConnectionManager@44ca0f2f name:
ZooKeeperConnection Watcher:192.168.30.32:2181,192.168.30.33:2181,
192.168.30.24:2181 got event WatchedEvent state:Expired type:None path:null
path: null type: None
WARN  - 2017-04-04 11:49:15.727; [   ]
org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper
session was expired. Attempting to reconnect to recover relationship with
ZooKeeper...
WARN  - 2017-04-04 11:49:15.728; [   ]
org.apache.solr.common.cloud.DefaultConnectionStrategy; Connection expired
- starting a new one...
ERROR - 2017-04-04 11:49:22.040; [c:doc s:shard6 r:core_node27
x:doc_shard6_replica1] org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
are disabled.
at
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1739)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:703)
at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:306)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:271)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:251)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:173)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at

Re: Problem starting solr 6.5

2017-04-04 Thread Rick Leir
Looks like a file permissions problem to me.

On April 3, 2017 10:42:15 PM EDT, wlee  wrote:
>Try to start solr and get this error message.  What is the problem ?
>
>
>$ bin/solr start
>
>Exception in thread "main" java.nio.file.AccessDeniedException:
>/usr/local/solr-6/solr-6.5.0/server/logs
>
>at
>sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>
>at
>sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>
>at
>sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>
>at
>sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>
>at java.nio.file.Files.createDirectory(Files.java:674)
>
>at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>
>at java.nio.file.Files.createDirectories(Files.java:767)
>
>at
>org.apache.solr.util.SolrCLI$UtilsTool.archiveGcLogs(SolrCLI.java:3565)
>
>at org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3548)
>
>at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
>
>Failed archiving old GC logs
>
>Exception in thread "main" java.nio.file.AccessDeniedException:
>/usr/local/solr-6/solr-6.5.0/server/logs
>
>at
>sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>
>at
>sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>
>at
>sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>
>at
>sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>
>at java.nio.file.Files.createDirectory(Files.java:674)
>
>at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>
>at java.nio.file.Files.createDirectories(Files.java:767)
>
>at
>org.apache.solr.util.SolrCLI$UtilsTool.archiveConsoleLogs(SolrCLI.java:3594)
>
>at org.apache.solr.util.SolrCLI$UtilsTool.runTool(SolrCLI.java:3551)
>
>at org.apache.solr.util.SolrCLI.main(SolrCLI.java:250)
>
>Failed archiving old console logs
>
>
>ERROR: Logs directory /usr/local/solr-6/solr-6.5.0/server/logs could
>not be created. Exiting
>
>
>
>
>
>--
>View this message in context:
>http://lucene.472066.n3.nabble.com/Problem-starting-solr-6-5-tp4328227.html
>Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Getting counts for JSON facet percentiles

2017-04-04 Thread Georg Sorst
Hi list!

Is it possible to get counts for the JSON facet percentiles? Of course I
could trivially calculate them myself, they are percentiles after all, but
there are cases where these may be off by one such as calculating the 50th
percentile / median over 3 results.

Thanks and best,
Georg


Re: Do streaming expressions support range facets?

2017-04-04 Thread Joel Bernstein
The facet expression, which uses the json facet API, currently does not
support range facets. So currently you would have to use the json facet API
directly to do range facets. The facet expression will support range facets
in the near future though.

There is a ticket open which adds date functions which could be used as
part of a rollup expression (
https://issues.apache.org/jira/browse/SOLR-10303). A rollup expression is
the MapReduce aggregation approach.





Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, Apr 4, 2017 at 12:53 AM, adfel70  wrote:

> Specifically date ranges?
>
> I would like to perform some kind of OLAP cube on the data in solr, and
> looking at streaming expressions for this.
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Do-streaming-expressions-support-range-facets-tp4328233.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Using function queries for faceting

2017-04-04 Thread Georg Sorst
Hi list!

My documents are eCommerce items. They may have a special price for a
certain group of users, but not for other groups of users; in that case the
default price should be used. So the documents look like something like
this:

item:
  id: 1
  price_default: 11.5
  price_group1: 11.2
item:
  id: 2
  price_default: 12.3
  price_group2: 12.5

Now when I want to fetch the documents and display the correct price for
group1 I can use 'fl=def(price_group1,price_default)'. Works like a charm!
It will return price_group1 for document 1 and price_default for document 2.

Is there a way to do this for faceting as well? I've unsuccessfully tried:

* facet.field=def(price_group1,price_default)
* facet.field=effective_price:def(price_group1,price_default)
* facet.field={!func}def(price_group1,price_default)
* facet.field={!func}effective_price:def(price_group1,price_default)
* json.facet={price:"def(price_group1,price_default)"}

I'm fine with either the "old" facet API or the JSON facets.Any ideas?

Thanks!
Georg


Fwd: Fq and termfrequency are not showing the correct results

2017-04-04 Thread Ayush Gupta
Hi Everyone,

I have a document that contains data like this "Bachelor's degree is easier
to get" in the 'body' field and I am making a query on this field searching
for word 'Bachelor's degree' like this - query?fq=body:"bachelor%27s%
20degree"&fl=body_frequency:termfreq(body,"bachelor%27s%20degree"),body and
I am getting zero results in response even when I have documents that
contains words like 'Bachelor's degree'.

I checked in the admin panel tab , there i can see the
WordDelimiterFilterFactory applied on the word 'Bachelor's Degree' and
converting it to 'Bachelor degree'. So both in the Field Value (Query) and
Field Value (Index) the WordDelimiterFilterFactory is converting the word
'Bachelor's Degree' to 'Bachelor's Degree' , SO why I am getting zero
results when quering. I have attached the screenshots of my analysis page.


I have attached a code file 'code.txt' where you can see the code for the
field 'body'.


Please tell me what Am I doing wrong.

Thanks

-- 
*CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents, files 
or previous e-mail messages attached to it, are confidential and may be 
privileged.  If you are not the intended recipient, or a person responsible 
for delivering it to the intended recipient, you are hereby notified that 
any review, disclosure, copying, distribution, retransmission, 
dissemination or other use of any of the information contained in, or 
attached to, this transmission is STRICTLY PROHIBITED.  If you have 
received this transmission in error, please immediately notify the sender. 
 Please destroy the original transmission and its attachments without 
reading or saving in any manner. Thank you.*

*algoscale technologies private limited*



  






  
  






  
  






  


How to Apply 'implicit' routing in exist collection in solr 6.1.0

2017-04-04 Thread Ketan Thanki

Hi,

Need the help for how to apply 'implicit' routing in existing collections.
e.g :  I have configure the 2 collections with each has 4 shard and 4 replica 
so what changes should i
do for apply ' implicit' routing.

Please  do needful with some examples.

Regards,
Ketan.

[CC Award Winners!]



Re: Solr Cloud 6.5.0 Replicas go down while indexing

2017-04-04 Thread Salih Sen
Hi,

Sorry for the initial hurried up mail, here is some correction and further
explanation:

Problem I described previously was happening before we set zkClientTimeout
value so it was 3 when it happened.

autoCommit maxTime value is 15000 and autoSoftCommit maxTime is 6.

We recently removed maxDocs values from autoCommit settings and it seems
more stable so far and has better response time.

I can’t seem to find these values on Solr logs probably because logging
level is currently WARN but we left those as default so I think they’re set
as the values in solr.xml
${distribUpdateSoTimeout:60}
${distribUpdateConnTimeout:6}


We have 12 replicas using default routing. All commits and queries are
going to a single node because of the dummy client we use. Documents are
send in JSON format. I don’t have exact knowledge of document size, they
are mostly news article sized, though with lots of dynamic fields.

Sematext SPM currently shows “Added Docs Rate” as ~1.70k/sec for the server
that is receiving updates.

Once problem starts happening multiple replicas go down (not necessarily
the one receiving the update request from client) and cluster starts
returning errors to update requests.


We saw entries like following in Zookeeper logs that’s why we thought it
might be related to zkClientTimeout and value.

2017-04-03 09:13:03,040 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.30.32:36420 which had sessionid 0x25ad61c4507008c
2017-04-03 09:27:02,078 [myid:1] - WARN  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid
0x15b14ba8a8e0026, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745)
2017-04-03 09:27:02,079 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
client /192.168.30.32:35636 which had sessionid 0x15b14ba8a8e0026
2017-04-03 09:35:19,362 [myid:1] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.30.32:37970



*Salih Şen*
M: +90 533 131 17 07
E: sa...@dilisim.com
W: www.dilisim.com
Skype: slhsen

On 3 April 2017 at 18:01:15, Erick Erickson (erickerick...@gmail.com) wrote:

bq: We set Auto hardcommit time to 15sec and 1 docs, and soft
commit to 6 sec and 5000 seconds

Just a sanity check, the commit intervals are in milliseconds, your
units look mixed up above, I'm guessing it's just a typo though. I
usually don't use maxDocs because it's unpredictable. Say you're
indexing at a furious rate. If you are indexing at 5,000 docs a second
(and assuming the above was supposed to be soft committing every 60
seconds or 5,000 docs) you'll still be autocommitting every second.

While that could be related, it's not particularly germane to your
timeout. My guess is that you're getting these errors on the leader?
what do you have in solr.xml for:

distribUpdateConnTimeout and distribUpdateSoTimeout

Those are likely the timeouts that matter. And how big are your
documents? The scenario I'm thinking of is that the leader sends the
update to the replica and the timeout for the replica's response
exceeds the ones above.

BTW, it can be useful on startup to look at your solr.log. The
_actual_ values for all the timeouts are printed out, including any
sysvars you've used.

And how are you indexing? Mostly I'm wondering how fast you're sending
docs to each leader and how.

Best,
Erick

On Mon, Apr 3, 2017 at 6:52 AM, Salih Sen  wrote:
> Hi,
>
> We have a three server set up with each server having 756G ram, 48 cores,
> 4SSDs (each having tree solr instances on them) and a dedicated
mechanical
> disk for zookeeper (3 zk instances total). Each Solr instances have 31G
of
> heap space allocated to them. In total we have 36 Solr Instances and 3
> Zookeeper instances (with 1G heapspace). Also servers 10Gig network
between
> them.
>
> We set Auto hardcommit time to 15sec and 1 docs, and soft commit to
> 6 sec and 5000 seconds in order to avoid soft committing too much and
> avoiding indexing bottlenecks. We also set DzkClientTimeout=9.
>
> But it seems replicas still randomly go down while indexing. Do you have
any
> suggestions to prevent this situation?
>
> ERROR - 2017-04-03 12:24:02.503; [ ]
> org.apache.solr.cloud.OverseerCollectionMessageHandler; Error from shard:
> http://192.168.30.33:9132/solr
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://192.168.30.33:9132/solr
> at
>
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:621)

> at
>
org.apache.solr.client.solrj.impl.HttpSolrClien