Re: Collection name in result

2017-06-22 Thread Erick Erickson
How are you submitting the query to to collections? Aliasing to them both?

The simplest would be just to index the name of the collection with
each doc and return that field

Best,
Erick

On Thu, Jun 22, 2017 at 6:05 PM, Jagrut Sharma  wrote:
> I'm submitting a search term to SolrCloud to query 2 collections. The
> response that comes back does not have the collection name from which the
> result came.
>
> Is it possible to know the collection which returned the result?
>
> Thanks.
>
> --
> Jagrut


Collection name in result

2017-06-22 Thread Jagrut Sharma
I'm submitting a search term to SolrCloud to query 2 collections. The
response that comes back does not have the collection name from which the
result came.

Is it possible to know the collection which returned the result?

Thanks.

-- 
Jagrut


Re: Spatial Search based on the amount of docs, not the distance

2017-06-22 Thread Tim Casey
deniz,

I was going to add something here.  The reason what you want is probably
hard to do is because you are asking solr, which stores a document, to
return documents using an attribute of document pairs.  As only a though
exercise, if you stored record pairs as a single document, you could
probably query it directly.  That is, if you have d1 and d2 and you are
querying  around d1 and ordering by distance, then you could get this
directly from a document representing a record pair.  I don't think this is
practical, because it is an n^2 store.

Since the n^2 problem is there, people are going to suggest some heuristic
which avoids this problem.  What Erick is suggesting is down this path.
Query around a point and sort by distance taking the top K results.  The
result is taking a linear slice of the n^2 distance attribute.

tim



On Wed, Jun 21, 2017 at 7:50 PM, Erick Erickson 
wrote:

> Would it serve to sort by distance? True, if you matched a zillion
> documents within a 1km radius you'd still perform the distance calcs, but
> the result would be a manageable number.
>
> I have to ask "Why to you care?". Is this an efficiency question (i.e. you
> want to keep Solr from having to do expensive work) or is it a question of
> having to get hits at all? It's at least possible that the solution for one
> is not the solution for the other.
>
> Best,
> Erick
>
> On Wed, Jun 21, 2017 at 5:32 PM, deniz  wrote:
>
> > it is for sure possible to use d value for limiting the distance,
> however,
> > it
> > might not be very efficient, as some of the coords may not have any docs
> > around for a large value of d... so it is hard to determine a default
> value
> > for d.
> >
> > though it sounds like havinga default d and gradual increments on its
> value
> > might be a work around for top K results...
> >
> >
> >
> >
> >
> > -
> > Zeki ama calismiyor... Calissa yapar...
> > --
> > View this message in context: http://lucene.472066.n3.
> > nabble.com/Spatial-Search-based-on-the-amount-of-docs-not-the-distance-
> > tp4342108p4342258.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Mixing distrib=true and false in one request handler?

2017-06-22 Thread Walter Underwood
OK. We’re going with a separate call to /suggest. For those of us with 
controlled vocabularies, a suggest.distrib would be a handy thing.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 22, 2017, at 4:32 PM, Alessandro Benedetti  
> wrote:
> 
> Hi Walter,
> As I mentioned in the first mail, I don't think [1] will help, I was
> referring to the source code to explain that in my opinion such feature is
> not available.
> Looking in the source code ( the JavaDoc is not enough) , that class
> presents the suggester params, and there is no param for the distribution
> of the request I am afraid.
> 
> 
> [1] org.apache.solr.spelling.suggest.SuggesterParams
> 
> --
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> www.sease.io
> 
> On Thu, Jun 22, 2017 at 7:16 PM, Walter Underwood 
> wrote:
> 
>> I really don’t understand [1]. I read the JavaDoc for that, but how does
>> it help? What do I put in the solrconfig.xml?
>> 
>> I’m pretty good at figuring out Solr stuff. I started with Solr 1.2.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jun 22, 2017, at 2:12 AM, alessandro.benedetti 
>> wrote:
>>> 
>>> A short answer seems to be No [1] .
>>> 
>>> On the other side I discussed in a couple of related Jira issues in the
>> past
>>> as I( + other people) believe we should anyway always return unique
>>> suggestions [2] .
>>> 
>>> Despite it passed a year, nor me nor others have actually progressed on
>> that
>>> issue :(
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> [1] org.apache.solr.spelling.suggest.SuggesterParams
>>> [2] https://issues.apache.org/jira/browse/SOLR-8672 and mostly
>>> https://issues.apache.org/jira/browse/LUCENE-6336
>>> 
>>> 
>>> 
>>> -
>>> ---
>>> Alessandro Benedetti
>>> Search Consultant, R&D Software Engineer, Director
>>> Sease Ltd. - www.sease.io
>>> --
>>> View this message in context: http://lucene.472066.n3.
>> nabble.com/Mixing-distrib-true-and-false-in-one-request-
>> handler-tp4342229p4342310.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
>> 



Re: Mixing distrib=true and false in one request handler?

2017-06-22 Thread Alessandro Benedetti
Hi Walter,
As I mentioned in the first mail, I don't think [1] will help, I was
referring to the source code to explain that in my opinion such feature is
not available.
Looking in the source code ( the JavaDoc is not enough) , that class
presents the suggester params, and there is no param for the distribution
of the request I am afraid.


[1] org.apache.solr.spelling.suggest.SuggesterParams

--
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io

On Thu, Jun 22, 2017 at 7:16 PM, Walter Underwood 
wrote:

> I really don’t understand [1]. I read the JavaDoc for that, but how does
> it help? What do I put in the solrconfig.xml?
>
> I’m pretty good at figuring out Solr stuff. I started with Solr 1.2.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jun 22, 2017, at 2:12 AM, alessandro.benedetti 
> wrote:
> >
> > A short answer seems to be No [1] .
> >
> > On the other side I discussed in a couple of related Jira issues in the
> past
> > as I( + other people) believe we should anyway always return unique
> > suggestions [2] .
> >
> > Despite it passed a year, nor me nor others have actually progressed on
> that
> > issue :(
> >
> >
> >
> >
> >
> >
> >
> > [1] org.apache.solr.spelling.suggest.SuggesterParams
> > [2] https://issues.apache.org/jira/browse/SOLR-8672 and mostly
> > https://issues.apache.org/jira/browse/LUCENE-6336
> >
> >
> >
> > -
> > ---
> > Alessandro Benedetti
> > Search Consultant, R&D Software Engineer, Director
> > Sease Ltd. - www.sease.io
> > --
> > View this message in context: http://lucene.472066.n3.
> nabble.com/Mixing-distrib-true-and-false-in-one-request-
> handler-tp4342229p4342310.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: SSN Regex Search

2017-06-22 Thread Erick Erickson
First I wouldn't use regex queries, they're expensive.
WordDelimiter(Graph)Filter is designed for these use cases, have you
considered that?

And what do you mean: "special dash character issue"? Yes, it's the NOT
operator, but you can always escape it.

Best,
Erick

On Thu, Jun 22, 2017 at 1:54 PM, Furkan KAMACI 
wrote:

> Hi,
>
> How can I search for SSN regex pattern which overwhelms special dash
> character issue?
>
> As you know that /[0-9]{3}-[0-9]{2}-[0-9]{4}/ will not work as intended.
>
> Kind Regards,
> Furkan KAMACI
>


Re: Streaming Expressions : rollup function returning results with duplicate tuples

2017-06-22 Thread Pratik Patel
Yes, that was the missing piece. Thanks a lot!

On Thu, Jun 22, 2017 at 5:20 PM, Joel Bernstein  wrote:

> Here is the psuedo code:
>
> rollup(sort(fetch(gatherNodes(
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 22, 2017 at 5:19 PM, Joel Bernstein 
> wrote:
>
> > You'll need to use the sort expression to sort the nodes by schemaType
> > first. The rollup expression is doing a MapReduce rollup that requires
> the
> > the records to be sorted by the "over" fields.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Thu, Jun 22, 2017 at 2:49 PM, Pratik Patel 
> wrote:
> >
> >> Hi,
> >>
> >> I have a streaming expression which uses rollup function. My
> understanding
> >> is that rollup takes an incoming stream and aggregates over given
> buckets.
> >> However, with following query the result contains duplicate tuples.
> >>
> >> Following is the streaming expression.
> >>
> >> rollup(
> >> fetch(
> >> collection1,
> >> gatherNodes(
> >> collection1,
> >> gatherNodes(collection1,
> >> walk="54227b412a1c4e574f88f2bb
> >> ->eventParticipantID",
> >> gather="eventID"
> >> ),
> >> walk="eventID->conceptid",
> >> gather="conceptid",
> >> trackTraversal="true", scatter="branches,leaves"
> >> ),
> >> fl="schematype",
> >> on="node=conceptid"
> >> ),
> >> over="schematype",
> >> count(schematype)
> >> )
> >>
> >> The result returned is as follows.
> >>
> >> {
> >>   "result-set": {
> >> "docs": [
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Company"
> >>   },
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Founding Event"
> >>   },
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Customer"
> >>   },
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Founding Event"  // duplicate
> >>   },
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Employment"  // duplicate
> >>   },
> >>   {
> >> "count(schematype)": 1,
> >> "schematype": "Founding Event"
> >>   },
> >>   {
> >> "count(schematype)": 4,
> >> "schematype": "Employment"
> >>   },..
> >>  ]
> >>  }
> >>
> >> As you can see, there are more than one tuples for 'Founding
> >> Event'/'Employment'
> >>
> >> Am I missing something here?
> >>
> >> Following is the content of stream which is wrapped by rollup, if it
> >> helps.
> >>
> >> // stream on which rollup is working
> >> {
> >>   "result-set": {
> >> "docs": [
> >>   {
> >> "node": "54227b412a1c4e574f88f2bb",
> >> "schematype": "Company",
> >> "collection": "collection1",
> >> "field": "node",
> >> "level": 0
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166aea5",
> >> "schematype": "Founding Event",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166ae99",
> >> "schematype": "Customer",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166aea1",
> >> "schematype": "Founding Event",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166ae78",
> >> "schematype": "Employment",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1
> >>   },
> >>   {
> >> "node": "54ee6178b54c1d65412b5f9f",
> >> "schematype": "Founding Event",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166ae7c",
> >> "schematype": "Employment",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166ae80",
> >> "schematype": "Employment",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166ae8a",
> >> "schematype": "Employment",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166ae94",
> >> "schematype": "Employment",
> >> "collection": "collection1",
> >> "field": "eventID",
> >> "level": 1
> >>   },
> >>   {
> >> "node": "543004f0c92c0a651166ae9d",
> >> "schematype": "C

Re: Streaming Expressions : rollup function returning results with duplicate tuples

2017-06-22 Thread Joel Bernstein
Here is the psuedo code:

rollup(sort(fetch(gatherNodes(

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 22, 2017 at 5:19 PM, Joel Bernstein  wrote:

> You'll need to use the sort expression to sort the nodes by schemaType
> first. The rollup expression is doing a MapReduce rollup that requires the
> the records to be sorted by the "over" fields.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Thu, Jun 22, 2017 at 2:49 PM, Pratik Patel  wrote:
>
>> Hi,
>>
>> I have a streaming expression which uses rollup function. My understanding
>> is that rollup takes an incoming stream and aggregates over given buckets.
>> However, with following query the result contains duplicate tuples.
>>
>> Following is the streaming expression.
>>
>> rollup(
>> fetch(
>> collection1,
>> gatherNodes(
>> collection1,
>> gatherNodes(collection1,
>> walk="54227b412a1c4e574f88f2bb
>> ->eventParticipantID",
>> gather="eventID"
>> ),
>> walk="eventID->conceptid",
>> gather="conceptid",
>> trackTraversal="true", scatter="branches,leaves"
>> ),
>> fl="schematype",
>> on="node=conceptid"
>> ),
>> over="schematype",
>> count(schematype)
>> )
>>
>> The result returned is as follows.
>>
>> {
>>   "result-set": {
>> "docs": [
>>   {
>> "count(schematype)": 1,
>> "schematype": "Company"
>>   },
>>   {
>> "count(schematype)": 1,
>> "schematype": "Founding Event"
>>   },
>>   {
>> "count(schematype)": 1,
>> "schematype": "Customer"
>>   },
>>   {
>> "count(schematype)": 1,
>> "schematype": "Founding Event"  // duplicate
>>   },
>>   {
>> "count(schematype)": 1,
>> "schematype": "Employment"  // duplicate
>>   },
>>   {
>> "count(schematype)": 1,
>> "schematype": "Founding Event"
>>   },
>>   {
>> "count(schematype)": 4,
>> "schematype": "Employment"
>>   },..
>>  ]
>>  }
>>
>> As you can see, there are more than one tuples for 'Founding
>> Event'/'Employment'
>>
>> Am I missing something here?
>>
>> Following is the content of stream which is wrapped by rollup, if it
>> helps.
>>
>> // stream on which rollup is working
>> {
>>   "result-set": {
>> "docs": [
>>   {
>> "node": "54227b412a1c4e574f88f2bb",
>> "schematype": "Company",
>> "collection": "collection1",
>> "field": "node",
>> "level": 0
>>   },
>>   {
>> "node": "543004f0c92c0a651166aea5",
>> "schematype": "Founding Event",
>> "collection": "collection1",
>> "field": "eventID",
>> "level": 1
>>   },
>>   {
>> "node": "543004f0c92c0a651166ae99",
>> "schematype": "Customer",
>> "collection": "collection1",
>> "field": "eventID",
>> "level": 1
>>   },
>>   {
>> "node": "543004f0c92c0a651166aea1",
>> "schematype": "Founding Event",
>> "collection": "collection1",
>> "field": "eventID",
>> "level": 1
>>   },
>>   {
>> "node": "543004f0c92c0a651166ae78",
>> "schematype": "Employment",
>> "collection": "collection1",
>> "field": "eventID",
>> "level": 1
>>   },
>>   {
>> "node": "54ee6178b54c1d65412b5f9f",
>> "schematype": "Founding Event",
>> "collection": "collection1",
>> "field": "eventID",
>> "level": 1
>>   },
>>   {
>> "node": "543004f0c92c0a651166ae7c",
>> "schematype": "Employment",
>> "collection": "collection1",
>> "field": "eventID",
>> "level": 1
>>   },
>>   {
>> "node": "543004f0c92c0a651166ae80",
>> "schematype": "Employment",
>> "collection": "collection1",
>> "field": "eventID",
>> "level": 1
>>   },
>>   {
>> "node": "543004f0c92c0a651166ae8a",
>> "schematype": "Employment",
>> "collection": "collection1",
>> "field": "eventID",
>> "level": 1
>>   },
>>   {
>> "node": "543004f0c92c0a651166ae94",
>> "schematype": "Employment",
>> "collection": "collection1",
>> "field": "eventID",
>> "level": 1
>>   },
>>   {
>> "node": "543004f0c92c0a651166ae9d",
>> "schematype": "Customer",
>> "collection": "collection1",
>> "field": "eventID",
>> "level": 1
>>   },
>>   {
>> "EOF": true,
>> "RESPONSE_TIME": 38
>>   }
>> ]
>>   }
>> }
>>
>> If I rollup on the level field then the results are as expected but not
>> when the field is schematype. Any idea what's going on here?
>>
>>
>> Thanks,
>>
>> Pratik
>>
>
>


Re: Streaming Expressions : rollup function returning results with duplicate tuples

2017-06-22 Thread Joel Bernstein
You'll need to use the sort expression to sort the nodes by schemaType
first. The rollup expression is doing a MapReduce rollup that requires the
the records to be sorted by the "over" fields.

Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Jun 22, 2017 at 2:49 PM, Pratik Patel  wrote:

> Hi,
>
> I have a streaming expression which uses rollup function. My understanding
> is that rollup takes an incoming stream and aggregates over given buckets.
> However, with following query the result contains duplicate tuples.
>
> Following is the streaming expression.
>
> rollup(
> fetch(
> collection1,
> gatherNodes(
> collection1,
> gatherNodes(collection1,
> walk="54227b412a1c4e574f88f2bb->
> eventParticipantID",
> gather="eventID"
> ),
> walk="eventID->conceptid",
> gather="conceptid",
> trackTraversal="true", scatter="branches,leaves"
> ),
> fl="schematype",
> on="node=conceptid"
> ),
> over="schematype",
> count(schematype)
> )
>
> The result returned is as follows.
>
> {
>   "result-set": {
> "docs": [
>   {
> "count(schematype)": 1,
> "schematype": "Company"
>   },
>   {
> "count(schematype)": 1,
> "schematype": "Founding Event"
>   },
>   {
> "count(schematype)": 1,
> "schematype": "Customer"
>   },
>   {
> "count(schematype)": 1,
> "schematype": "Founding Event"  // duplicate
>   },
>   {
> "count(schematype)": 1,
> "schematype": "Employment"  // duplicate
>   },
>   {
> "count(schematype)": 1,
> "schematype": "Founding Event"
>   },
>   {
> "count(schematype)": 4,
> "schematype": "Employment"
>   },..
>  ]
>  }
>
> As you can see, there are more than one tuples for 'Founding
> Event'/'Employment'
>
> Am I missing something here?
>
> Following is the content of stream which is wrapped by rollup, if it helps.
>
> // stream on which rollup is working
> {
>   "result-set": {
> "docs": [
>   {
> "node": "54227b412a1c4e574f88f2bb",
> "schematype": "Company",
> "collection": "collection1",
> "field": "node",
> "level": 0
>   },
>   {
> "node": "543004f0c92c0a651166aea5",
> "schematype": "Founding Event",
> "collection": "collection1",
> "field": "eventID",
> "level": 1
>   },
>   {
> "node": "543004f0c92c0a651166ae99",
> "schematype": "Customer",
> "collection": "collection1",
> "field": "eventID",
> "level": 1
>   },
>   {
> "node": "543004f0c92c0a651166aea1",
> "schematype": "Founding Event",
> "collection": "collection1",
> "field": "eventID",
> "level": 1
>   },
>   {
> "node": "543004f0c92c0a651166ae78",
> "schematype": "Employment",
> "collection": "collection1",
> "field": "eventID",
> "level": 1
>   },
>   {
> "node": "54ee6178b54c1d65412b5f9f",
> "schematype": "Founding Event",
> "collection": "collection1",
> "field": "eventID",
> "level": 1
>   },
>   {
> "node": "543004f0c92c0a651166ae7c",
> "schematype": "Employment",
> "collection": "collection1",
> "field": "eventID",
> "level": 1
>   },
>   {
> "node": "543004f0c92c0a651166ae80",
> "schematype": "Employment",
> "collection": "collection1",
> "field": "eventID",
> "level": 1
>   },
>   {
> "node": "543004f0c92c0a651166ae8a",
> "schematype": "Employment",
> "collection": "collection1",
> "field": "eventID",
> "level": 1
>   },
>   {
> "node": "543004f0c92c0a651166ae94",
> "schematype": "Employment",
> "collection": "collection1",
> "field": "eventID",
> "level": 1
>   },
>   {
> "node": "543004f0c92c0a651166ae9d",
> "schematype": "Customer",
> "collection": "collection1",
> "field": "eventID",
> "level": 1
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 38
>   }
> ]
>   }
> }
>
> If I rollup on the level field then the results are as expected but not
> when the field is schematype. Any idea what's going on here?
>
>
> Thanks,
>
> Pratik
>


SSN Regex Search

2017-06-22 Thread Furkan KAMACI
Hi,

How can I search for SSN regex pattern which overwhelms special dash
character issue?

As you know that /[0-9]{3}-[0-9]{2}-[0-9]{4}/ will not work as intended.

Kind Regards,
Furkan KAMACI


RE: How can I enable NER Plugin in Solr 6.x

2017-06-22 Thread Markus Jelsma
Solr hasn't got built in support for NER, but you can try its UIMA integration 
with external third-party suppliers:
https://cwiki.apache.org/confluence/display/solr/UIMA+Integration

 
 
-Original message-
> From:FOTACHE CHRISTIAN 
> Sent: Thursday 22nd June 2017 19:03
> To: Solr-user 
> Subject: How can I enable NER Plugin in Solr 6.x
> 
> I need to enable NER Plugin in Solr 6.x in order to extract locations from 
> the text when committing documents to Solr .
> How can I achieve this in the simpliest way possible? Please help Christian 
> Fotache Tel: 0728.297.207  
> 


Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-22 Thread Susheel Kumar
Hi Joel,

I am able to reproduce this in a simple way.  Looks like Let Stream is
having some issues.  Below complement function works fine if I execute
outside let and returns an EOF:true tuple but if a tuple with EOF:true
assigned to let variable, it gets changed to EXCEPTION "Index 0, Size 0"
etc.

So let stream not able to handle the stream/results which has only EOF
tuple and breaks the whole let expression block


===Complement inside let
let(
a=echo(Hello),
b=complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email
asc"),
sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
on="id,email"),
c=get(b),
get(a)
)

Result
===
{
  "result-set": {
"docs": [
  {
"EXCEPTION": "Index: 0, Size: 0",
"EOF": true,
"RESPONSE_TIME": 1
  }
]
  }
}

===Complement outside let

complement(sort(select(tuple(id=1,email="A"),id,email),by="id asc,email
asc"),
sort(select(tuple(id=1,email="A"),id,email),by="id asc,email asc"),
on="id,email")

Result
===
{ "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 0 } ] } }








On Thu, Jun 22, 2017 at 11:55 AM, Susheel Kumar 
wrote:

> Sorry for typo
>
> Facing a weird behavior when using hashJoin / innerJoin etc. The below
> expression display tuples from variable a shown below
>
>
> let(a=fetch(SMS,having(rollup(over=email,
>  count(email),
> select(search(SMS,
> q=*:*,
> fl="id,dv_sv_business_email",
> sort="dv_sv_business_email asc"),
>id,
>dv_sv_business_email as email)),
> eq(count(email),1)),
> fl="id,dv_sv_business_email as email",
> on="email=dv_sv_business_email"),
> b=fetch(SMS,having(rollup(over=email,
>  count(email),
> select(search(SMS,
> q=*:*,
> fl="id,dv_sv_personal_email",
> sort="dv_sv_personal_email asc"),
>id,
>dv_sv_personal_email as email)),
> eq(count(email),1)),
> fl="id,dv_sv_personal_email as email",
> on="email=dv_sv_personal_email"),
> c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email
> asc"),on="email"),
> #d=select(get(c),id,email),
> get(a)
> )
>
> var a result
> ==
> {
>   "result-set": {
> "docs": [
>   {
> "count(email)": 1,
> "id": "1",
> "email": "A"
>   },
>   {
> "count(email)": 1,
> "id": "2",
> "email": "C"
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 18
>   }
> ]
>   }
> }
>
> And after uncomment var d above, even though we are displaying a, we get
> results shown below. I understand that join in my test data didn't find any
> match but then it should not skew up the results of var a.  When data
> matches during join then its fine but otherwise I am running into this
> issue and whole next expressions doesn't get evaluated due to this...
>
>
> after uncomment var d
> ===
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Index: 0, Size: 0",
> "EOF": true,
> "RESPONSE_TIME": 44
>   }
> ]
>   }
> }
>
> On Thu, Jun 22, 2017 at 11:51 AM, Susheel Kumar 
> wrote:
>
>> Hello Joel,
>>
>> Facing a weird behavior when using hashJoin / innerJoin etc. The below
>> expression display tuples from variable a   and the moment I use get on
>> innerJoin / hashJoin expr on variable c
>>
>>
>> let(a=fetch(SMS,having(rollup(over=email,
>>  count(email),
>> select(search(SMS,
>> q=*:*,
>> fl="id,dv_sv_business_email",
>> sort="dv_sv_business_email asc"),
>>id,
>>dv_sv_business_email as email)),
>> eq(count(email),1)),
>> fl="id,dv_sv_business_email as email",
>> on="email=dv_sv_business_email"),
>> b=fetch(SMS,having(rollup(over=email,
>>  count(email),
>> select(search(SMS,
>> q=*:*,
>> fl="id,dv_sv_personal_email",
>> sort="dv_sv_personal_email asc"),
>>id,
>>dv_sv_personal_email as email)),
>> eq(count(email),1)),
>> fl="id,dv_sv_personal_email as email",
>> on="email=dv_sv_personal_email"),
>> c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email
>> asc"),on="email"),
>> #d=select(get(c),id,email),
>> get(a)
>> )
>>
>> var a result
>> ==
>> {
>>   "result-set": {
>> "docs": [
>>   {
>> "count(email)": 1,
>> "id": "1",
>> "email": "A"
>>   },
>>   {
>> "count(email)": 1,
>> "id": "2",
>> "email": "C"
>>   },
>>   {
>> "EOF": true,
>> "RESPONSE_TIME": 18
>>   }
>> ]
>>   }
>> }
>>
>> after uncomment var d above, even though we are displaying a, we get
>> results like below. I understand that join in my test data didn't find any
>> match but then it should not skew up the 

Re: Velocity UI with Analyzing Infix Suggester?

2017-06-22 Thread Walter Underwood
The problem with the suggest response is that the suggest.q value is used as an 
attribute in the JSON response. That is just weird.

Is there some way to put in a wildcard in the Velocity selector? 
“$response.response.terms.name” works for /terms, but /suggest is different. 
And I’m running two suggesters anyway, both fuzzy and infix.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 6, 2017, at 4:34 AM, Rick Leir  wrote:
> 
>> typeahead solutions using a separate collection
> 
> Erik, Do you use a separate collection so it can be smaller and thereby 
> faster? Or so you can keep good performance on the the main collection 
> server? In my mind, the performance of the as-you-type is more important than 
> the regular search.
> Cheers -- Rick
> 
> On June 6, 2017 5:31:08 AM EDT, Erik Hatcher  wrote:
>> Walter -
>> 
>> I’ve done several one-off demos that have incorporated as-you-type Ajax
>> actions into /browse.   The first one I did was “instant search” (not
>> suggest) and left that sitting over at my “instant_search” branch - of
>> svn(!).  See the top two commits listed here:
>> https://github.com/erikhatcher/lucene-solr-svn/commits/instant_search
>> 
>> Lately I’ve been building typeahead solutions using a separate
>> collection rather than the Suggester component and wiring that into
>> /browse with just this sort of thing:
>> 
>>   $(function() { $(‘#search_box').bind("keyup",load_results); });
>> 
>> where load_results() does this:
>> 
>> $(‘#results’).load(…url with q=…)
>> 
>> It’s awesome to hear you use wt=velocity - made my day!   And by “in
>> 6.5.1” you mean it is in the way old tech products configset where it
>> uses an ancient jquery.autocomplete feature.  You could probably adapt
>> that bit straightforwardly to another endpoint and adjusting the
>> `extraParams` in there appropriately.  The trick used here is that the
>> response from /terms is simply a single suggestion per line in plain
>> text, by way of using wt=velocity with v.template=suggest:
>> 
>> #foreach($t in $response.response.terms.name)
>> $t.key
>> #end
>> 
>> Adjust that template to deal with your suggester end-point response so
>> that it writes out one per line as plain text and you’re there.   
>> Happy to help further if you run into any issues.
>> 
>> And yes, it’d be nice if this got built-in more modernly into the out
>> of the box /browse.  If you want to open a JIRA and hack through it
>> together I’m game.
>> 
>>  Erik
>> 
>> 
>>> On Jun 5, 2017, at 4:14 PM, Walter Underwood 
>> wrote:
>>> 
>>> Does anyone have the new suggester working in the Velocity browse UI?
>> In 6.5.1, it uses the terms component.
>>> 
>>> I could probably figure out how to do that in Velocity, but if
>> someone has already done that, it would be great.
>>> 
>>> We use the Velocity UI as an internal exploration and diagnostic
>> search page.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
> 
> -- 
> Sorry for being brief. Alternate email is rickleir at yahoo dot com



Streaming Expressions : rollup function returning results with duplicate tuples

2017-06-22 Thread Pratik Patel
Hi,

I have a streaming expression which uses rollup function. My understanding
is that rollup takes an incoming stream and aggregates over given buckets.
However, with following query the result contains duplicate tuples.

Following is the streaming expression.

rollup(
fetch(
collection1,
gatherNodes(
collection1,
gatherNodes(collection1,
walk="54227b412a1c4e574f88f2bb->eventParticipantID",
gather="eventID"
),
walk="eventID->conceptid",
gather="conceptid",
trackTraversal="true", scatter="branches,leaves"
),
fl="schematype",
on="node=conceptid"
),
over="schematype",
count(schematype)
)

The result returned is as follows.

{
  "result-set": {
"docs": [
  {
"count(schematype)": 1,
"schematype": "Company"
  },
  {
"count(schematype)": 1,
"schematype": "Founding Event"
  },
  {
"count(schematype)": 1,
"schematype": "Customer"
  },
  {
"count(schematype)": 1,
"schematype": "Founding Event"  // duplicate
  },
  {
"count(schematype)": 1,
"schematype": "Employment"  // duplicate
  },
  {
"count(schematype)": 1,
"schematype": "Founding Event"
  },
  {
"count(schematype)": 4,
"schematype": "Employment"
  },..
 ]
 }

As you can see, there are more than one tuples for 'Founding
Event'/'Employment'

Am I missing something here?

Following is the content of stream which is wrapped by rollup, if it helps.

// stream on which rollup is working
{
  "result-set": {
"docs": [
  {
"node": "54227b412a1c4e574f88f2bb",
"schematype": "Company",
"collection": "collection1",
"field": "node",
"level": 0
  },
  {
"node": "543004f0c92c0a651166aea5",
"schematype": "Founding Event",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae99",
"schematype": "Customer",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166aea1",
"schematype": "Founding Event",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae78",
"schematype": "Employment",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "54ee6178b54c1d65412b5f9f",
"schematype": "Founding Event",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae7c",
"schematype": "Employment",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae80",
"schematype": "Employment",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae8a",
"schematype": "Employment",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae94",
"schematype": "Employment",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"node": "543004f0c92c0a651166ae9d",
"schematype": "Customer",
"collection": "collection1",
"field": "eventID",
"level": 1
  },
  {
"EOF": true,
"RESPONSE_TIME": 38
  }
]
  }
}

If I rollup on the level field then the results are as expected but not
when the field is schematype. Any idea what's going on here?


Thanks,

Pratik


Re: Complement Stream function - Invalid ReducerStream - substream comparator (sort) must be a superset of this stream's comparator

2017-06-22 Thread Susheel Kumar
Please let me know if I shall create a JIRA and i can provide both
expressions and data to reproduce.

On Thu, Jun 22, 2017 at 11:23 AM, Susheel Kumar 
wrote:

> Yes, i tried building up expression piece by piece but looks like there is
> an issue with how complement expects / behave for sort.
>
> if i use below g and h expr inside complement which are already sorted
> (sort) then it doesn't work
>
> e=select(get(c),id,email),
> f=select(get(d),id,email),
> g=sort(get(e),by="id asc,email asc"),
> h=sort(get(f),by="id asc,email asc"),
> i=complement(get(g),get(h),on="id,email"),
>
> while below worked when i use e and f expr and sort them within complement
> function instead of using g and h directly
>
> e=select(get(c),id,email),
> f=select(get(d),id,email),
> g=sort(get(e),by="id asc,email asc"),
> h=sort(get(f),by="id asc,email asc"),
> i=complement(
> sort(get(e),by="id asc,email asc"),sort(get(f),by="id asc,email asc")
> ,on="id,email"),
>
> So I am good for now with above approach but running into another issue
> with empty/null/"Index 0, Size 0" set and will start another thread for
> that (Need your help there :-)).
>
> Appreciate and thanks for all your help while I try to solve my use case
> using streaming expressions.
>
>
> On Thu, Jun 22, 2017 at 11:10 AM, Joel Bernstein 
> wrote:
>
>> I suspect something is wrong in the syntax but I'm not seeing it.
>>
>> Have you tried building up the expression piece by piece until you get the
>> syntax error?
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Wed, Jun 21, 2017 at 3:20 PM, Susheel Kumar 
>> wrote:
>>
>> > While simple complement works in this way
>> >
>> > ===
>> > complement(merge(sort(select(echo("A"),echo as email),by="email asc"),
>> > sort(select(echo("B"),echo as email),by="email asc"),
>> > on="email asc"),
>> > merge(sort(select(echo("A"),echo as email),by="email asc"),
>> > sort(select(echo("D"),echo as email),by="email asc"),on="email asc"),
>> > on="email")
>> >
>> > BUT below it doesn't work when used in similar way
>> >
>> > ===
>> > let(a=fetch(collection1,having(rollup(over=email,
>> >  count(email),
>> > select(search(collection1,
>> > q=*:*,
>> > fl="id,business_email",
>> > sort="business_email asc"),
>> >id,
>> >business_email as email)),
>> > eq(count(email),1)),
>> > fl="id,business_email as email",
>> > on="email=business_email"),
>> > b=fetch(collection1,having(rollup(over=email,
>> >  count(email),
>> > select(search(collection1,
>> > q=*:*,
>> > fl="id,personal_email",
>> > sort="personal_email asc"),
>> >id,
>> >personal_email as email)),
>> > eq(count(email),1)),
>> > fl="id,personal_email as email",
>> > on="email=personal_email"),
>> > c=hashJoin(get(a),hashed=get(b),on="email"),
>> > d=hashJoin(get(b),hashed=get(a),on="email"),
>> > e=select(get(c),id,email),
>> > f=select(get(d),id,email),
>> > g=sort(get(e),by="id asc,email asc"),
>> > h=sort(get(f),by="id asc,email asc"),
>> > i=complement(get(g),get(h),on="id,email"),
>> > get(i)
>> > )
>> >
>> >
>> > On Wed, Jun 21, 2017 at 11:29 AM, Susheel Kumar 
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > Two issues with complement function (solr 6.6)
>> > >
>> > > 1)  When i execute below streaming expression,
>> > >
>> > > ==
>> > >
>> > > let(a=fetch(collection1,having(rollup(over=email,
>> > >  count(email),
>> > > select(search(collection1,
>> > > q=*:*,
>> > > fl="id,business_email",
>> > > sort="business_email asc"),
>> > >id,
>> > >business_email as email)),
>> > > eq(count(email),1)),
>> > > fl="id,business_email as email",
>> > > on="email=business_email"),
>> > > b=fetch(collection1,having(rollup(over=email,
>> > >  count(email),
>> > > select(search(collection1,
>> > > q=*:*,
>> > > fl="id,personal_email",
>> > > sort="personal_email asc"),
>> > >id,
>> > >personal_email as email)),
>> > > eq(count(email),1)),
>> > > fl="id,personal_email as email",
>> > > on="email=personal_email"),
>> > > c=hashJoin(get(a),hashed=get(b),on="email"),
>> > > d=hashJoin(get(b),hashed=get(a),on="email"),
>> > > e=select(get(c),id,email),
>> > > f=select(get(d),id,email),
>> > > g=sort(get(e),by="id asc,email asc"),
>> > > h=sort(get(f),by="id asc,email asc"),
>> > > i=complement(get(g),get(h),on="id,email"),
>> > > get(i)
>> > > )
>> > >
>> > >
>> > > getting response as
>> > >
>> > > { "result-set": { "docs": [ { "EXCEPTION": "Invalid ReducerStream -
>> > > substream comparator (sort) must be a superset of this stream's
>> > > comparator.", "EOF": true } ] } }
>> > >
>> > > 2) when i execute below
>> > >
>> > >

Re: Mixing distrib=true and false in one request handler?

2017-06-22 Thread Walter Underwood
I really don’t understand [1]. I read the JavaDoc for that, but how does it 
help? What do I put in the solrconfig.xml?

I’m pretty good at figuring out Solr stuff. I started with Solr 1.2.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 22, 2017, at 2:12 AM, alessandro.benedetti  
> wrote:
> 
> A short answer seems to be No [1] .
> 
> On the other side I discussed in a couple of related Jira issues in the past
> as I( + other people) believe we should anyway always return unique
> suggestions [2] .
> 
> Despite it passed a year, nor me nor others have actually progressed on that
> issue :(
> 
> 
> 
> 
> 
> 
> 
> [1] org.apache.solr.spelling.suggest.SuggesterParams
> [2] https://issues.apache.org/jira/browse/SOLR-8672 and mostly
> https://issues.apache.org/jira/browse/LUCENE-6336
> 
> 
> 
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Mixing-distrib-true-and-false-in-one-request-handler-tp4342229p4342310.html
> Sent from the Solr - User mailing list archive at Nabble.com.



How can I enable NER Plugin in Solr 6.x

2017-06-22 Thread FOTACHE CHRISTIAN
I need to enable NER Plugin in Solr 6.x in order to extract locations from the 
text when committing documents to Solr .
How can I achieve this in the simpliest way possible? Please help Christian 
Fotache Tel: 0728.297.207  


Re: Error after moving index

2017-06-22 Thread Erick Erickson
"They're just files, man". If you can afford a bit of down-time, you can
shut your Solr down and recursively copy the data directory from your
source to destingation. SCP, rsync, whatever then restart solr.

Do take some care when copying between Windows and *nix that you do a
_binary_ transfer.

If you continually have a problem with transferring between Windows and
*nix we'll have to investigate further. And I'm assuming this is
stand-alone and there's only a single restore going on at a time.

Best,
Erick

On Thu, Jun 22, 2017 at 9:13 AM, Moritz Michael 
wrote:

>
>
>
>
>
>
>
>
> BTW, is there a better/recommended way to transfer an
> index to another solr?
>
>
>
>
>
>
>
>
>
> On Thu, Jun 22, 2017 at 6:09 PM +0200, "Moritz Michael" <
> moritz.mu...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hello Michael,
> I used the backup functionality to create a snapshot and uploaded this
> snapshot, so I feel it should be save.
> I'll try it again. Maybe the copy operation wasn't successful.
> BestMoritz
>
>
>
> _
> From: Michael Kuhlmann 
> Sent: Donnerstag, Juni 22, 2017 2:50 PM
> Subject: Re: Error after moving index
> To:  
>
>
> Hi Moritz,
>
> did you stop your local Solr sever before? Copying data from a running
> instance may cause headaches.
>
> If yes, what happens if you copy everything again? It seems that your
> copy operations wasn't successful.
>
> Best,
> Michael
>
> Am 22.06.2017 um 14:37 schrieb Moritz Munte:
> > Hello,
> >
> >
> >
> > I created an index on my local machine (Windows 10) and it works fine
> there.
> >
> > After uploading the index to the production server (Linux), the server
> shows
> > an error:
> .
>
>
>
>
>
>
>
>
>
>


Re: Error after moving index

2017-06-22 Thread Susheel Kumar
Usually we index directly into Prod solr than copying from local/lower
environments.  If that works in your scenario, i would suggest to directly
index into Prod than copying/restoring from local Windows env to Linux.

On Thu, Jun 22, 2017 at 12:13 PM, Moritz Michael 
wrote:

>
>
>
>
>
>
>
>
> BTW, is there a better/recommended way to transfer an
> index to another solr?
>
>
>
>
>
>
>
>
>
> On Thu, Jun 22, 2017 at 6:09 PM +0200, "Moritz Michael" <
> moritz.mu...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Hello Michael,
> I used the backup functionality to create a snapshot and uploaded this
> snapshot, so I feel it should be save.
> I'll try it again. Maybe the copy operation wasn't successful.
> BestMoritz
>
>
>
> _
> From: Michael Kuhlmann 
> Sent: Donnerstag, Juni 22, 2017 2:50 PM
> Subject: Re: Error after moving index
> To:  
>
>
> Hi Moritz,
>
> did you stop your local Solr sever before? Copying data from a running
> instance may cause headaches.
>
> If yes, what happens if you copy everything again? It seems that your
> copy operations wasn't successful.
>
> Best,
> Michael
>
> Am 22.06.2017 um 14:37 schrieb Moritz Munte:
> > Hello,
> >
> >
> >
> > I created an index on my local machine (Windows 10) and it works fine
> there.
> >
> > After uploading the index to the production server (Linux), the server
> shows
> > an error:
> .
>
>
>
>
>
>
>
>
>
>


Re: Error after moving index

2017-06-22 Thread Moritz Michael








BTW, is there a better/recommended way to transfer an index to 
another solr?









On Thu, Jun 22, 2017 at 6:09 PM +0200, "Moritz Michael" 
 wrote:


















Hello Michael,
I used the backup functionality to create a snapshot and uploaded this 
snapshot, so I feel it should be save. 
I'll try it again. Maybe the copy operation wasn't successful. 
BestMoritz



_
From: Michael Kuhlmann 
Sent: Donnerstag, Juni 22, 2017 2:50 PM
Subject: Re: Error after moving index
To:  


Hi Moritz,

did you stop your local Solr sever before? Copying data from a running
instance may cause headaches.

If yes, what happens if you copy everything again? It seems that your
copy operations wasn't successful.

Best,
Michael

Am 22.06.2017 um 14:37 schrieb Moritz Munte:
> Hello,
>
>  
>
> I created an index on my local machine (Windows 10) and it works fine there.
>
> After uploading the index to the production server (Linux), the server shows
> an error:
.











Re: Error after moving index

2017-06-22 Thread Moritz Michael






Hello Michael,
I used the backup functionality to create a snapshot and uploaded this 
snapshot, so I feel it should be save. 
I'll try it again. Maybe the copy operation wasn't successful. 
BestMoritz



_
From: Michael Kuhlmann 
Sent: Donnerstag, Juni 22, 2017 2:50 PM
Subject: Re: Error after moving index
To:  


Hi Moritz,

did you stop your local Solr sever before? Copying data from a running
instance may cause headaches.

If yes, what happens if you copy everything again? It seems that your
copy operations wasn't successful.

Best,
Michael

Am 22.06.2017 um 14:37 schrieb Moritz Munte:
> Hello,
>
>  
>
> I created an index on my local machine (Windows 10) and it works fine there.
>
> After uploading the index to the production server (Linux), the server shows
> an error:
.






Re: Index 0, Size 0 - hashJoin Stream function Error

2017-06-22 Thread Susheel Kumar
Sorry for typo

Facing a weird behavior when using hashJoin / innerJoin etc. The below
expression display tuples from variable a shown below


let(a=fetch(SMS,having(rollup(over=email,
 count(email),
select(search(SMS,
q=*:*,
fl="id,dv_sv_business_email",
sort="dv_sv_business_email asc"),
   id,
   dv_sv_business_email as email)),
eq(count(email),1)),
fl="id,dv_sv_business_email as email",
on="email=dv_sv_business_email"),
b=fetch(SMS,having(rollup(over=email,
 count(email),
select(search(SMS,
q=*:*,
fl="id,dv_sv_personal_email",
sort="dv_sv_personal_email asc"),
   id,
   dv_sv_personal_email as email)),
eq(count(email),1)),
fl="id,dv_sv_personal_email as email",
on="email=dv_sv_personal_email"),
c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email
asc"),on="email"),
#d=select(get(c),id,email),
get(a)
)

var a result
==
{
  "result-set": {
"docs": [
  {
"count(email)": 1,
"id": "1",
"email": "A"
  },
  {
"count(email)": 1,
"id": "2",
"email": "C"
  },
  {
"EOF": true,
"RESPONSE_TIME": 18
  }
]
  }
}

And after uncomment var d above, even though we are displaying a, we get
results shown below. I understand that join in my test data didn't find any
match but then it should not skew up the results of var a.  When data
matches during join then its fine but otherwise I am running into this
issue and whole next expressions doesn't get evaluated due to this...


after uncomment var d
===
{
  "result-set": {
"docs": [
  {
"EXCEPTION": "Index: 0, Size: 0",
"EOF": true,
"RESPONSE_TIME": 44
  }
]
  }
}

On Thu, Jun 22, 2017 at 11:51 AM, Susheel Kumar 
wrote:

> Hello Joel,
>
> Facing a weird behavior when using hashJoin / innerJoin etc. The below
> expression display tuples from variable a   and the moment I use get on
> innerJoin / hashJoin expr on variable c
>
>
> let(a=fetch(SMS,having(rollup(over=email,
>  count(email),
> select(search(SMS,
> q=*:*,
> fl="id,dv_sv_business_email",
> sort="dv_sv_business_email asc"),
>id,
>dv_sv_business_email as email)),
> eq(count(email),1)),
> fl="id,dv_sv_business_email as email",
> on="email=dv_sv_business_email"),
> b=fetch(SMS,having(rollup(over=email,
>  count(email),
> select(search(SMS,
> q=*:*,
> fl="id,dv_sv_personal_email",
> sort="dv_sv_personal_email asc"),
>id,
>dv_sv_personal_email as email)),
> eq(count(email),1)),
> fl="id,dv_sv_personal_email as email",
> on="email=dv_sv_personal_email"),
> c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email
> asc"),on="email"),
> #d=select(get(c),id,email),
> get(a)
> )
>
> var a result
> ==
> {
>   "result-set": {
> "docs": [
>   {
> "count(email)": 1,
> "id": "1",
> "email": "A"
>   },
>   {
> "count(email)": 1,
> "id": "2",
> "email": "C"
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 18
>   }
> ]
>   }
> }
>
> after uncomment var d above, even though we are displaying a, we get
> results like below. I understand that join in my test data didn't find any
> match but then it should not skew up the results of var a.  When data
> matches during join then its fine but otherwise I am running into this
> issue and whole next expressions doesn't get evaluated due to this...
>
>
> {
>   "result-set": {
> "docs": [
>   {
> "EXCEPTION": "Index: 0, Size: 0",
> "EOF": true,
> "RESPONSE_TIME": 44
>   }
> ]
>   }
> }
>


Index 0, Size 0 - hashJoin Stream function Error

2017-06-22 Thread Susheel Kumar
Hello Joel,

Facing a weird behavior when using hashJoin / innerJoin etc. The below
expression display tuples from variable a   and the moment I use get on
innerJoin / hashJoin expr on variable c


let(a=fetch(SMS,having(rollup(over=email,
 count(email),
select(search(SMS,
q=*:*,
fl="id,dv_sv_business_email",
sort="dv_sv_business_email asc"),
   id,
   dv_sv_business_email as email)),
eq(count(email),1)),
fl="id,dv_sv_business_email as email",
on="email=dv_sv_business_email"),
b=fetch(SMS,having(rollup(over=email,
 count(email),
select(search(SMS,
q=*:*,
fl="id,dv_sv_personal_email",
sort="dv_sv_personal_email asc"),
   id,
   dv_sv_personal_email as email)),
eq(count(email),1)),
fl="id,dv_sv_personal_email as email",
on="email=dv_sv_personal_email"),
c=innerJoin(sort(get(a),by="email asc"),sort(get(b),by="email
asc"),on="email"),
#d=select(get(c),id,email),
get(a)
)

var a result
==
{
  "result-set": {
"docs": [
  {
"count(email)": 1,
"id": "1",
"email": "A"
  },
  {
"count(email)": 1,
"id": "2",
"email": "C"
  },
  {
"EOF": true,
"RESPONSE_TIME": 18
  }
]
  }
}

after uncomment var d above, even though we are displaying a, we get
results like below. I understand that join in my test data didn't find any
match but then it should not skew up the results of var a.  When data
matches during join then its fine but otherwise I am running into this
issue and whole next expressions doesn't get evaluated due to this...


{
  "result-set": {
"docs": [
  {
"EXCEPTION": "Index: 0, Size: 0",
"EOF": true,
"RESPONSE_TIME": 44
  }
]
  }
}


Re: Complement Stream function - Invalid ReducerStream - substream comparator (sort) must be a superset of this stream's comparator

2017-06-22 Thread Susheel Kumar
Yes, i tried building up expression piece by piece but looks like there is
an issue with how complement expects / behave for sort.

if i use below g and h expr inside complement which are already sorted
(sort) then it doesn't work

e=select(get(c),id,email),
f=select(get(d),id,email),
g=sort(get(e),by="id asc,email asc"),
h=sort(get(f),by="id asc,email asc"),
i=complement(get(g),get(h),on="id,email"),

while below worked when i use e and f expr and sort them within complement
function instead of using g and h directly

e=select(get(c),id,email),
f=select(get(d),id,email),
g=sort(get(e),by="id asc,email asc"),
h=sort(get(f),by="id asc,email asc"),
i=complement(
sort(get(e),by="id asc,email asc"),sort(get(f),by="id asc,email asc")
,on="id,email"),

So I am good for now with above approach but running into another issue
with empty/null/"Index 0, Size 0" set and will start another thread for
that (Need your help there :-)).

Appreciate and thanks for all your help while I try to solve my use case
using streaming expressions.


On Thu, Jun 22, 2017 at 11:10 AM, Joel Bernstein  wrote:

> I suspect something is wrong in the syntax but I'm not seeing it.
>
> Have you tried building up the expression piece by piece until you get the
> syntax error?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Jun 21, 2017 at 3:20 PM, Susheel Kumar 
> wrote:
>
> > While simple complement works in this way
> >
> > ===
> > complement(merge(sort(select(echo("A"),echo as email),by="email asc"),
> > sort(select(echo("B"),echo as email),by="email asc"),
> > on="email asc"),
> > merge(sort(select(echo("A"),echo as email),by="email asc"),
> > sort(select(echo("D"),echo as email),by="email asc"),on="email asc"),
> > on="email")
> >
> > BUT below it doesn't work when used in similar way
> >
> > ===
> > let(a=fetch(collection1,having(rollup(over=email,
> >  count(email),
> > select(search(collection1,
> > q=*:*,
> > fl="id,business_email",
> > sort="business_email asc"),
> >id,
> >business_email as email)),
> > eq(count(email),1)),
> > fl="id,business_email as email",
> > on="email=business_email"),
> > b=fetch(collection1,having(rollup(over=email,
> >  count(email),
> > select(search(collection1,
> > q=*:*,
> > fl="id,personal_email",
> > sort="personal_email asc"),
> >id,
> >personal_email as email)),
> > eq(count(email),1)),
> > fl="id,personal_email as email",
> > on="email=personal_email"),
> > c=hashJoin(get(a),hashed=get(b),on="email"),
> > d=hashJoin(get(b),hashed=get(a),on="email"),
> > e=select(get(c),id,email),
> > f=select(get(d),id,email),
> > g=sort(get(e),by="id asc,email asc"),
> > h=sort(get(f),by="id asc,email asc"),
> > i=complement(get(g),get(h),on="id,email"),
> > get(i)
> > )
> >
> >
> > On Wed, Jun 21, 2017 at 11:29 AM, Susheel Kumar 
> > wrote:
> >
> > > Hi,
> > >
> > > Two issues with complement function (solr 6.6)
> > >
> > > 1)  When i execute below streaming expression,
> > >
> > > ==
> > >
> > > let(a=fetch(collection1,having(rollup(over=email,
> > >  count(email),
> > > select(search(collection1,
> > > q=*:*,
> > > fl="id,business_email",
> > > sort="business_email asc"),
> > >id,
> > >business_email as email)),
> > > eq(count(email),1)),
> > > fl="id,business_email as email",
> > > on="email=business_email"),
> > > b=fetch(collection1,having(rollup(over=email,
> > >  count(email),
> > > select(search(collection1,
> > > q=*:*,
> > > fl="id,personal_email",
> > > sort="personal_email asc"),
> > >id,
> > >personal_email as email)),
> > > eq(count(email),1)),
> > > fl="id,personal_email as email",
> > > on="email=personal_email"),
> > > c=hashJoin(get(a),hashed=get(b),on="email"),
> > > d=hashJoin(get(b),hashed=get(a),on="email"),
> > > e=select(get(c),id,email),
> > > f=select(get(d),id,email),
> > > g=sort(get(e),by="id asc,email asc"),
> > > h=sort(get(f),by="id asc,email asc"),
> > > i=complement(get(g),get(h),on="id,email"),
> > > get(i)
> > > )
> > >
> > >
> > > getting response as
> > >
> > > { "result-set": { "docs": [ { "EXCEPTION": "Invalid ReducerStream -
> > > substream comparator (sort) must be a superset of this stream's
> > > comparator.", "EOF": true } ] } }
> > >
> > > 2) when i execute below
> > >
> > >
> > > complement(
> > >   select(search(collection1, q=*:*, fl="id,business_email", sort="id
> > asc, business_email asc"),id,business_email as email),
> > >   select(search(collection1, q=*:*, fl="id,personal_email", sort="id
> > asc, personal_email asc"),id,personal_email as email),
> > >   on="id,email"
> > > )
> > >
> > > g

Re: Complement Stream function - Invalid ReducerStream - substream comparator (sort) must be a superset of this stream's comparator

2017-06-22 Thread Joel Bernstein
I suspect something is wrong in the syntax but I'm not seeing it.

Have you tried building up the expression piece by piece until you get the
syntax error?

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 21, 2017 at 3:20 PM, Susheel Kumar 
wrote:

> While simple complement works in this way
>
> ===
> complement(merge(sort(select(echo("A"),echo as email),by="email asc"),
> sort(select(echo("B"),echo as email),by="email asc"),
> on="email asc"),
> merge(sort(select(echo("A"),echo as email),by="email asc"),
> sort(select(echo("D"),echo as email),by="email asc"),on="email asc"),
> on="email")
>
> BUT below it doesn't work when used in similar way
>
> ===
> let(a=fetch(collection1,having(rollup(over=email,
>  count(email),
> select(search(collection1,
> q=*:*,
> fl="id,business_email",
> sort="business_email asc"),
>id,
>business_email as email)),
> eq(count(email),1)),
> fl="id,business_email as email",
> on="email=business_email"),
> b=fetch(collection1,having(rollup(over=email,
>  count(email),
> select(search(collection1,
> q=*:*,
> fl="id,personal_email",
> sort="personal_email asc"),
>id,
>personal_email as email)),
> eq(count(email),1)),
> fl="id,personal_email as email",
> on="email=personal_email"),
> c=hashJoin(get(a),hashed=get(b),on="email"),
> d=hashJoin(get(b),hashed=get(a),on="email"),
> e=select(get(c),id,email),
> f=select(get(d),id,email),
> g=sort(get(e),by="id asc,email asc"),
> h=sort(get(f),by="id asc,email asc"),
> i=complement(get(g),get(h),on="id,email"),
> get(i)
> )
>
>
> On Wed, Jun 21, 2017 at 11:29 AM, Susheel Kumar 
> wrote:
>
> > Hi,
> >
> > Two issues with complement function (solr 6.6)
> >
> > 1)  When i execute below streaming expression,
> >
> > ==
> >
> > let(a=fetch(collection1,having(rollup(over=email,
> >  count(email),
> > select(search(collection1,
> > q=*:*,
> > fl="id,business_email",
> > sort="business_email asc"),
> >id,
> >business_email as email)),
> > eq(count(email),1)),
> > fl="id,business_email as email",
> > on="email=business_email"),
> > b=fetch(collection1,having(rollup(over=email,
> >  count(email),
> > select(search(collection1,
> > q=*:*,
> > fl="id,personal_email",
> > sort="personal_email asc"),
> >id,
> >personal_email as email)),
> > eq(count(email),1)),
> > fl="id,personal_email as email",
> > on="email=personal_email"),
> > c=hashJoin(get(a),hashed=get(b),on="email"),
> > d=hashJoin(get(b),hashed=get(a),on="email"),
> > e=select(get(c),id,email),
> > f=select(get(d),id,email),
> > g=sort(get(e),by="id asc,email asc"),
> > h=sort(get(f),by="id asc,email asc"),
> > i=complement(get(g),get(h),on="id,email"),
> > get(i)
> > )
> >
> >
> > getting response as
> >
> > { "result-set": { "docs": [ { "EXCEPTION": "Invalid ReducerStream -
> > substream comparator (sort) must be a superset of this stream's
> > comparator.", "EOF": true } ] } }
> >
> > 2) when i execute below
> >
> >
> > complement(
> >   select(search(collection1, q=*:*, fl="id,business_email", sort="id
> asc, business_email asc"),id,business_email as email),
> >   select(search(collection1, q=*:*, fl="id,personal_email", sort="id
> asc, personal_email asc"),id,personal_email as email),
> >   on="id,email"
> > )
> >
> > getting response as
> >
> > {
> >   "result-set": {
> > "docs": [
> >   {
> > "EXCEPTION": "Invalid expression complement(
> select(search(collection1, q=*:*, fl=\"id,business_email\", sort=\"id asc,
> business_email asc\"),id,business_email as email),
> select(search(collection1, q=*:*, fl=\"id,personal_email\", sort=\"id asc,
> personal_email asc\"),id,personal_email as email),  on=\"id,email\") -
> unknown operands found",
> > "EOF": true
> >   }
> > ]
> >   }
> > }
> >
> >
>


Re: com.ibm.icu dependency errors when building solr source code

2017-06-22 Thread Amrit Sarkar
Running "ant eclipse" or "ant test" in verbose mode will provide you the
exact lib in ivy2 cache which is corrupt. Delete that particular lib and
run "ant" again. Also don't try to get out / exit  "ant" commands via
Ctrl+C or Ctrl+V while it is downloading the libraries to ivy2 folder.


Re: com.ibm.icu dependency errors when building solr source code

2017-06-22 Thread Erick Erickson
Sometimes I've seen something like this when the ivy cache is corrupt. It's
a pain since it takes a while to re-download things, but you might try to
remove that entire cache. On my mac it's 'rm -rf ~.ivy2/cache'

Erick

On Thu, Jun 22, 2017 at 3:39 AM, Susheel Kumar 
wrote:

> Hello,
>
> Am i missing something or the source code is broken.  Took latest code from
> master and when doing "ant eclipse" or "ant test",  getting below error.
>
> ivy-configure:
>
> [ivy:configure] :: loading settings :: file =
> /Users/kumars5/src/git/code/lucene-solr/lucene/top-level-ivy-settings.xml
>
>
> resolve:
>
> [ivy:retrieve]
>
> [ivy:retrieve] :: problems summary ::
>
> [ivy:retrieve]  WARNINGS
>
> [ivy:retrieve] ::
>
> [ivy:retrieve] ::  UNRESOLVED DEPENDENCIES ::
>
> [ivy:retrieve] ::
>
> [ivy:retrieve] :: com.ibm.icu#icu4j;59.1: configuration not found in
> com.ibm.icu#icu4j;59.1: 'master'. It was required from
> org.apache.lucene#analyzers-icu;working@ROSELCDV0001LJC compile
>
> [ivy:retrieve] ::
>
> [ivy:retrieve]
>
> [ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
>
>
> BUILD FAILED
>
> /Users/kumars5/src/git/code/lucene-solr/build.xml:300: The following error
> occurred while executing this line:
>
> /Users/kumars5/src/git/code/lucene-solr/lucene/build.xml:130: The
> following
> error occurred while executing this line:
>
> /Users/kumars5/src/git/code/lucene-solr/lucene/common-build.xml:2179: The
> following error occurred while executing this line:
>
> /Users/kumars5/src/git/code/lucene-solr/lucene/analysis/build.xml:91: The
> following error occurred while executing this line:
>
> /Users/kumars5/src/git/code/lucene-solr/lucene/analysis/build.xml:38: The
> following error occurred while executing this line:
>
> /Users/kumars5/src/git/code/lucene-solr/lucene/common-build.xml:409:
> impossible to resolve dependencies:
>
> resolve failed - see output for details
>


Re: When to use LTR

2017-06-22 Thread Ryan Yacyshyn
Thanks for your help Alessandro!

Ryan


On Wed, 21 Jun 2017 at 19:25 alessandro.benedetti 
wrote:

> Hi Ryan,
> first thing to know is that Learning To Rank is about relevancy and
> specifically it is about to improve your relevancy function.
> Deciding if to use or not LTR has nothing to do with your index size or
> update frequency ( although LTR brings some performance consideration you
> will need to evaluate) .
>
> Functionally, the moment you realize you want LTR is when you start tuning
> your relevancy.
> Normally the first approach is the manual one, you identify a set of
> features, interesting for your use case and you tune a boosting function to
> improve your search experience.
>
> e.g.
> you decide to weight more the title field than the content and then
> boosting
> recent documents.
>
> What happens next is :
> "How much should I weight more the title ?"
> "How much should I boost recent documents ?"
>
> Normally you just check some golden queries and you try to manually
> optimise
> these boosting factors by hand.
>
> LTR answers to this requirements.
> To make it simple LTR will bring you a model that will tell you the best
> weighting factors given your domain ( and past experience) to get the most
> relevant results for all the queries ( this is the ideal, of course it is
> quite complicated and it depends of a lot of factors)
>
> Of course it doesn't work like magic and you will need to extensively
> design
> your features ( features engineering), build a valid training set (
> explicit
> or implicit), decide the model that best suites your needs ( linear model
> or
> Tree based ?) and a lot of corollary configurations.
>
> hope this helps!
>
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/When-to-use-LTR-tp4342130p4342140.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Search Problem with Multiple Data-Import Handler

2017-06-22 Thread Josh Lincoln
I suspect Erik's right that clean=true is the problem. That's the default
in the DIH interface.


I find that when using DIH, it's best to set preImportDeleteQuery for every
entity. This safely scopes the clean variable to just that entity.
It doesn't look like the docs have examples of using preImportDeleteQuery,
so I put one here:




On Wed, Jun 21, 2017 at 7:48 PM Erick Erickson 
wrote:

> First place I'd look is whether the jobs have clean=true set. If so the
> first thing DIH does is delete all documents.
>
> Best,
> Erick
>
> On Wed, Jun 21, 2017 at 3:52 PM, Pandey Brahmdev 
> wrote:
>
> > Hi,
> > I have setup Apache Solr 6.6.0 on Windows 10, 64-bit.
> >
> > I have created a simple core & configured DataImport Handlers.
> > I have configured 2 dataImport handlers in the Solr-config.xml file.
> >
> > First for to connect to DB & have data from DB Tables.
> > And Second for to have data from all pdf files using TikaEntityProcessor.
> >
> > Now the problem is there is no error in the console or anywhere but
> > whenever I want to search using "Query" tab it gives me the result of
> Data
> > Import.
> >
> > So let's say if I last Imported data for Tables then it gives me to
> result
> > from the table and if I imported PDF Files then it searches inside PDF
> > Files.
> >
> > But now when I again want to search for DB Tables values then It doesn't
> > give me the result instead I again need to Import Data for
> > DataImportHandler for File & vice-versa.
> >
> > Can you please help me out here?
> > Very sorry if I am doing anything wrong as I have started using Apache
> Solr
> > only 2 days back.
> >
> > Thanks & Regards,
> > Brahmdev Pandey
> > +46 767086309 <+46%2076%20708%2063%2009>
> >
>


Re: Error after moving index

2017-06-22 Thread Michael Kuhlmann
Hi Moritz,

did you stop your local Solr sever before? Copying data from a running
instance may cause headaches.

If yes, what happens if you copy everything again? It seems that your
copy operations wasn't successful.

Best,
Michael

Am 22.06.2017 um 14:37 schrieb Moritz Munte:
> Hello,
>
>  
>
> I created an index on my local machine (Windows 10) and it works fine there.
>
> After uploading the index to the production server (Linux), the server shows
> an error:
.


Error after moving index

2017-06-22 Thread Moritz Munte
Hello,

 

I created an index on my local machine (Windows 10) and it works fine there.

After uploading the index to the production server (Linux), the server shows
an error:

 

java.util.concurrent.ExecutionException:
org.apache.solr.common.SolrException: Unable to create core
[contentselect_v3]

at java.util.concurrent.FutureTask.report(FutureTask.java:122)

at java.util.concurrent.FutureTask.get(FutureTask.java:192)

at
org.apache.solr.core.CoreContainer.lambda$load$6(CoreContainer.java:581)

at org.apache.solr.core.CoreContainer$$Lambda$107/382729.run(Unknown
Source)

at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(In
strumentedExecutorService.java:176)

at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$e
xecute$0(ExecutorUtil.java:229)

at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$
106/7410549.run(Unknown Source)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
42)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
17)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.solr.common.SolrException: Unable to create core
[contentselect_v3]

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:933)

at
org.apache.solr.core.CoreContainer.lambda$load$5(CoreContainer.java:553)

at
org.apache.solr.core.CoreContainer$$Lambda$105/10330637.call(Unknown Source)

at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(I
nstrumentedExecutorService.java:197)

... 6 more

Caused by: org.apache.solr.common.SolrException: Error opening new searcher

at org.apache.solr.core.SolrCore.(SolrCore.java:965)

at org.apache.solr.core.SolrCore.(SolrCore.java:831)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:918)

... 9 more

Caused by: org.apache.solr.common.SolrException: Error opening new searcher

at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2032)

at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2152)

at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1054)

at org.apache.solr.core.SolrCore.(SolrCore.java:938)

... 11 more

Caused by: org.apache.lucene.index.CorruptIndexException: Unexpected file
read error while reading index.
(resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/index/segme
nts_2e5r")))

at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)

at org.apache.lucene.index.IndexWriter.(IndexWriter.java:928)

at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:118)

at
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:93)

at
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSol
rCoreState.java:248)

at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreSt
ate.java:122)

at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1993)

... 14 more

Caused by: java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/data/index/segments_2e5r")

at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:33
6)

at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java
:140)

at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java
:116)

at
org.apache.lucene.store.BufferedChecksumIndexInput.readBytes(BufferedChecksu
mIndexInput.java:49)

at org.apache.lucene.store.DataInput.readString(DataInput.java:237)

at
org.apache.lucene.store.DataInput.readMapOfStrings(DataInput.java:287)

at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:402)

at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:288)

... 20 more

 

 

 

 

After doing checkIndex on the index, I get this error:

 

ERROR: could not read any segments file in directory

org.apache.lucene.index.CorruptIndexException: Unexpected file read error
while reading index.
(resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/data/index/segme
nts_2e5r")))

at
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:290)

at
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:527)

at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2769)

at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2671)

at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2597)

Caused by: java.io.EOFException: read past EOF:
NIOFSIndexInput(path="/data/index/segments_2e

com.ibm.icu dependency errors when building solr source code

2017-06-22 Thread Susheel Kumar
Hello,

Am i missing something or the source code is broken.  Took latest code from
master and when doing "ant eclipse" or "ant test",  getting below error.

ivy-configure:

[ivy:configure] :: loading settings :: file =
/Users/kumars5/src/git/code/lucene-solr/lucene/top-level-ivy-settings.xml


resolve:

[ivy:retrieve]

[ivy:retrieve] :: problems summary ::

[ivy:retrieve]  WARNINGS

[ivy:retrieve] ::

[ivy:retrieve] ::  UNRESOLVED DEPENDENCIES ::

[ivy:retrieve] ::

[ivy:retrieve] :: com.ibm.icu#icu4j;59.1: configuration not found in
com.ibm.icu#icu4j;59.1: 'master'. It was required from
org.apache.lucene#analyzers-icu;working@ROSELCDV0001LJC compile

[ivy:retrieve] ::

[ivy:retrieve]

[ivy:retrieve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS


BUILD FAILED

/Users/kumars5/src/git/code/lucene-solr/build.xml:300: The following error
occurred while executing this line:

/Users/kumars5/src/git/code/lucene-solr/lucene/build.xml:130: The following
error occurred while executing this line:

/Users/kumars5/src/git/code/lucene-solr/lucene/common-build.xml:2179: The
following error occurred while executing this line:

/Users/kumars5/src/git/code/lucene-solr/lucene/analysis/build.xml:91: The
following error occurred while executing this line:

/Users/kumars5/src/git/code/lucene-solr/lucene/analysis/build.xml:38: The
following error occurred while executing this line:

/Users/kumars5/src/git/code/lucene-solr/lucene/common-build.xml:409:
impossible to resolve dependencies:

resolve failed - see output for details


Re: Is it possible to support context filtering for FuzzyLookupFactory?

2017-06-22 Thread Georg Sorst
That would indeed be great! Does anyone know if there is a specific reason
for this or has it just not been implemented?

Jeffery Yuan  schrieb am Di., 20. Juni 2017, 22:54:

>
> FuzzyLookupFactory is great as it can still find matches even if users
> mis-spell.
>
> context filtering is also great, as we can only show suggestions based on
> user's languages, doc types etc
>
> But its a pity that (seems) FuzzyLookupFactory and context filtering don't
> work together.
>
> https://cwiki.apache.org/confluence/display/solr/Suggester
> Context filtering lets you filter suggestions by a separate context field,
> such as category, department or any other token. The
> AnalyzingInfixLookupFactory and BlendedInfixLookupFactory currently support
> this feature, when backed by DocumentDictionaryFactory.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Is-it-possible-to-support-context-filtering-for-FuzzyLookupFactory-tp4342051.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


MoreLikeThis Clarifications

2017-06-22 Thread Max Bridgewater
I am trying to confirm my understanding of MLT after going through
following page:
https://cwiki.apache.org/confluence/display/solr/MoreLikeThis.

Three approaches are mentioned:

1) Use it as a request handler and send text to the MoreLikeThis request
handler as needed.
2) Use it as a search component and MLT is performed on every document
returned
3) You use it as a request handler but with externally supplied text.


What are example queries in each case and what config changes are required
for each case?

There is also MLTQParser. When can I use this parser as opposed to use any
of the three above approaches?

Thanks,
Max.


Re: Mixing distrib=true and false in one request handler?

2017-06-22 Thread alessandro.benedetti
A short answer seems to be No [1] .

On the other side I discussed in a couple of related Jira issues in the past
as I( + other people) believe we should anyway always return unique
suggestions [2] .

Despite it passed a year, nor me nor others have actually progressed on that
issue :(







[1] org.apache.solr.spelling.suggest.SuggesterParams
[2] https://issues.apache.org/jira/browse/SOLR-8672 and mostly
https://issues.apache.org/jira/browse/LUCENE-6336



-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mixing-distrib-true-and-false-in-one-request-handler-tp4342229p4342310.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to synchronize the imports (DIH) delta imports

2017-06-22 Thread Mikhail Khludnev
Ok. It should be something like the scratch below (I'm sorry for full
package names).
The snippet below requests /dataimport status on db core and yields "idle"
string

org.apache.solr.client.solrj.impl.HttpSolrClient solrj=

  new org.apache.solr.client.solrj.impl.HttpSolrClient.Builder()

  .withBaseSolrUrl("http://localhost:8983/solr/";).build();

org.apache.solr.common.util.NamedList statRsp= solrj.request(new
org.apache.solr.client.solrj.request.GenericSolrRequest(org.apache.solr.client.solrj.SolrRequest.METHOD.GET,

"/dataimport",new
org.apache.solr.common.params.ModifiableSolrParams(){{

add("command","STATUS");

}}),"db");

statRsp.get("status");



On Wed, Jun 21, 2017 at 10:10 AM, Srinivas Kashyap 
wrote:

> Thanks Mikhail,
>
> Can you please explain the same? How can it be done in SolrJ
>
> Thanks and Regards,
> Srinivas Kashyap
> Senior Software Engineer
> “GURUDAS HERITAGE”
> 100 Feet Ring Road,
> Kadirenahalli,
> Banashankari 2nd Stage,
> Bangalore-560070
> P:  973-986-6105
> Bamboo Rose
> The only B2B marketplace powered by proven trade engines.
> www.BambooRose.com
>
> Make Retail. Fun. Connected. Easier. Smarter. Together. Better.
>
> -Original Message-
> From: Mikhail Khludnev [mailto:m...@apache.org]
> Sent: 21 June 2017 11:57 AM
> To: solr-user 
> Subject: Re: How to synchronize the imports (DIH) delta imports
>
> Hello, Srinivas.
>
> You can literally poll import status.
>
> On Wed, Jun 21, 2017 at 7:41 AM, Srinivas Kashyap  >
> wrote:
>
> > Hello,
> >
> > We have our architecture of index server, where delta-imports run
> > periodically based on modify_ts of the records.
> >
> > We have another adhoc import handler on each core where import is
> > called based on the key of solr core. This adhoc import is also called
> > periodically.
> >
> > We have scenario where multiple records are picked up for the adhoc
> > import and the index server starts indexing them sequentially. At the
> > subsequent time, if another adhoc import command is called, the
> > records are not being
> > indexed(skipped) as the solr core is busy re-indexing the earlier
> records.
> >
> > Is there a way we can poll the import status of index server in SolrJ,
> > so that we can refrain sending another adhoc import command while the
> > index is still runnning?
> >
> > Thanks and Regards,
> > Srinivas Kashyap
> > Senior Software Engineer
> > "GURUDAS HERITAGE"
> > 100 Feet Ring Road,
> > Kadirenahalli,
> > Banashankari 2nd Stage,
> > Bangalore-560070
> > P:  973-986-6105
> > Bamboo Rose
> > The only B2B marketplace powered by proven trade engines.
> > www.BambooRose.com
> >
> > Make Retail. Fun. Connected. Easier. Smarter. Together. Better.
> >
> > 
> >
> > DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are
> > confidential. If you are not the intended recipient, please notify the
> > sender immediately by replying to the e-mail, and then delete it
> > without making copies or using it in any way. No representation is
> > made that this email or any attachments are free of viruses. Virus
> > scanning is recommended and is the responsibility of the recipient.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> 
>
> DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are
> confidential. If you are not the intended recipient, please notify the
> sender immediately by replying to the e-mail, and then delete it without
> making copies or using it in any way. No representation is made that this
> email or any attachments are free of viruses. Virus scanning is recommended
> and is the responsibility of the recipient.
> 
>
> DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are
> confidential. If you are not the intended recipient, please notify the
> sender immediately by replying to the e-mail, and then delete it without
> making copies or using it in any way. No representation is made that this
> email or any attachments are free of viruses. Virus scanning is recommended
> and is the responsibility of the recipient.
>



-- 
Sincerely yours
Mikhail Khludnev