Re: Indexing logs in Solr

2016-06-05 Thread Anil
Any external application wont be integrated with our application.

are there any custom solr highlighter ? Thanks.

On 6 June 2016 at 04:12, Joe Lawson 
wrote:

> Flume and Logstash can both ship to Solr.
> On Jun 5, 2016 2:11 PM, "Otis Gospodnetic" 
> wrote:
>
> > You can ship SOLR logs to Logsene or any other log management service and
> > not worry too much about their storage/size.
> >
> > Otis
> >
> > > On Jun 5, 2016, at 02:08, Anil  wrote:
> > >
> > > Hi ,
> > >
> > > i would like to index logs using to enable search on it in our
> > application.
> > >
> > > The problem would be index and stored size as log files size would go
> > upto
> > > terabytes.
> > >
> > > is there any way to use highlight feature without storing ?
> > >
> > > i found following link where Benedetti Alessandro mentioned about
> custom
> > > highlighter on url field.
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Highlighting-for-non-stored-fields-td1773015.html
> > >
> > > Any ideas would be helpful. Thanks.
> > >
> > > Cheers,
> > > Anil
> >
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-05 Thread John Bickerstaff
Yes, query parameters/modifications mentioned in the readme.  Beyond those
I don't have useful advice at this point
On Jun 4, 2016 10:56 PM, "MaryJo Sminkey"  wrote:

> On Sat, Jun 4, 2016 at 11:47 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > MaryJo - I'm on vacation but can't resist... iirc there are some very
> > useful query modifications suggested in the readme on the github for the
> > plugin... can't access right now.
> >
>
>
> I'm assuming you mean the various query parameters. The only ones I see in
> there that would be of use for me are the ones I'm already using. As far as
> can tell from their description.
>
> MJ
>
>
> Sent with MailTrack
> <
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> >
>


Re: Indexing logs in Solr

2016-06-05 Thread Joe Lawson
Flume and Logstash can both ship to Solr.
On Jun 5, 2016 2:11 PM, "Otis Gospodnetic" 
wrote:

> You can ship SOLR logs to Logsene or any other log management service and
> not worry too much about their storage/size.
>
> Otis
>
> > On Jun 5, 2016, at 02:08, Anil  wrote:
> >
> > Hi ,
> >
> > i would like to index logs using to enable search on it in our
> application.
> >
> > The problem would be index and stored size as log files size would go
> upto
> > terabytes.
> >
> > is there any way to use highlight feature without storing ?
> >
> > i found following link where Benedetti Alessandro mentioned about custom
> > highlighter on url field.
> >
> >
> http://lucene.472066.n3.nabble.com/Highlighting-for-non-stored-fields-td1773015.html
> >
> > Any ideas would be helpful. Thanks.
> >
> > Cheers,
> > Anil
>


Multilingual Solr

2016-06-05 Thread Riedl, Johannes
Hi all,

we are currently in search of a solution for switching between different 
languages in the query results and keeping the possibility to perform a search 
in several languages in parallel.  The overall aim would be a constant field 
name and a an additional Solr parameter "lang=XX_YY" that allows to return the 
results in the chosen language while searches are applied to all languages. 
Setting up several cores to obtain a generic field name is not an option. Does 
anyone know of a clean way to achieve this, particularly routing content 
indexed to a generic field (e.g. title) to a "background field" (e.g. title_en, 
title_fr) etc on the fly and retrieving it from there depending on the language 
chosen.

Background: So far, we have investigated the multi-language field approach 
offered by Trey Grainger in the code examples for "Solr in Action" 
(https://github.com/treygrainger/solr-in-action.git, chapter 14), an extension 
to the ordinary textField that allows to use a generic field name and the 
language is encoded at the beginning of the field content and appropriate index 
and query analyzers associated to dummy fields in schema.xml. If there is a way 
to store data in these dummy fields and additionally the lang parameter is 
added we might be done.

Thanks a lot, best regards

Johannes


Re: Cloud Solr 5.3.1 + 6.0.1 cannot delete documents

2016-06-05 Thread Moritz Becker
I just checked the shards again (with =false) and it seems that
I was mistaken, the document does *not* reside in _different_ shards -
everything good in this respect.

However, I still have the issue that deleteById those not work whereas
deleteByQuery works. Specifically, the following line does *not* work:

UpdateResponse response = solrClient.deleteById(collection, );

And the following line works:

UpdateResponse response = solrClient.deleteByQuery(collection, "id:" +
);

I do not touch/change any other code when switching between these two
modes and in both scenarios I use CloudSolrClient.

Am 31.05.2016 um 05:32 schrieb Erick Erickson:
> bq: I checked in the Solr Admin and noticed that the same document
> resided in both shards on the same node
>
> If this means two _different_ shards (as opposed to two replicas in
> the _same_ shard) showed the
> document, then that's the proverbial "smoking gun", somehow your setup
> isn't what you think
> it is, perhaps you are somehow using implicit routing and routing the
> doc with the same ID to
> two different shards?
>
> try querying each of your replicas with =false to see if the
> doc is somehow on two different
> shards. If so, I suspect that's the root of your problems and figuring
> out _how_ that happened
> is the next step I'd recommend.
>
> As to why the raw URL deletes should work and CloudSolrClient doesn't,
> CloudSolrClient
> tries to send updates only to the shard that they should end up on. So
> if your routing is
> odd or you somehow have the same doc on two shards, the "wrong" shard wouldn't
> see the delete. There's some speculation here BTW, I didn't trace
> through the code...
>
> But this functionality is tested in the unit tests
> (CloudSolrClientTest.java), so I suspect it's
> something odd in your setup
>
> Best,
> Erick
>
> On Mon, May 30, 2016 at 12:33 PM, Moritz Becker  wrote:
>> Hi,
>>
>> I have the following issue:
>> I initially started with a Solr 5.3.1 + Zookeeper 3.4.6 cloud setup with 2 
>> solr nodes and with one collection consisting of 2 shards and 2 replicas.
>>
>> I am accessing the cluster using the CloudSolrClient. When I tried to delete 
>> a document, no error occurred but after deletion and subsequent commit, the 
>> document was still available via index queries.
>> I checked in the Solr Admin and noticed that the same document resided in 
>> both shards on the same node which I thought was odd.
>> Also after deleting the collection and recreating it, the issue remained.
>>
>> Then I tried upgrading to latest Solr 6.0.1 with the same setup. Again, I 
>> recreated the collection but I still could not delete the documents. Here is 
>> a log snippet of the deletion attempt of a single document:
>>
>> 
>>
>> 126023 INFO  (qtp12209492-16) [c:cc5363_dm_documentversion s:shard1 
>> r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
>> o.a.s.u.p.LogUpdateProcessorFactory 
>> [cc5363_dm_documentversion_shard1_replica1]  webapp=/solr path=/update 
>> params={update.distrib=FROMLEADER=http://localhost:8983/solr/cc5363_dm_documentversion_shard1_replica2/=javabin=2}{delete=[12535
>>  (-1535773473331216384)]} 0 16
>> 126024 INFO  (commitScheduler-15-thread-1) [c:cc5363_dm_documentversion 
>> s:shard1 r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
>> o.a.s.u.DirectUpdateHandler2 start 
>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
>> 126036 INFO  (commitScheduler-15-thread-1) [c:cc5363_dm_documentversion 
>> s:shard1 r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
>> o.a.s.c.SolrCore SolrIndexSearcher has not changed - not re-opening: 
>> org.apache.solr.search.SolrIndexSearcher
>> 126038 INFO  (commitScheduler-15-thread-1) [c:cc5363_dm_documentversion 
>> s:shard1 r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
>> o.a.s.u.DirectUpdateHandler2 end_commit_flush
>> 126049 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
>> r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] 
>> o.a.s.u.DirectUpdateHandler2 start 
>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>> 126050 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
>> r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] 
>> o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
>> 126051 INFO  (qtp12209492-19) [c:cc5363_dm_documentversion s:shard1 
>> r:core_node4 x:cc5363_dm_documentversion_shard1_replica1] 
>> o.a.s.u.DirectUpdateHandler2 start 
>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>> 126054 INFO  (qtp12209492-20) [c:cc5363_dm_documentversion s:shard2 
>> r:core_node1 x:cc5363_dm_documentversion_shard2_replica1] o.a.s.c.SolrCore 
>> SolrIndexSearcher has not changed - not 

Re: Getting a list of matching terms and offsets

2016-06-05 Thread Ahmet Arslan
Hi Lee,

May be you can find useful starting point on 
https://issues.apache.org/jira/browse/SOLR-1397

Please consider to contribute when you gather something working.

Ahmet




On Sunday, June 5, 2016 10:37 PM, Justin Lee  wrote:
Thanks, yea, I looked at debug query too.  Unfortunately the output of
debug query doesn't quite do it.  For example, if you use a wildcard query,
it will simply explain the score associated with that wildcard query, not
the actual matching token.  In order words, if you search for "hour*" and
the actual matching text is "hours", debug query doesn't tell you that.
Instead, it just reports the score associated with "hour*".

The closest example I've ever found is this:

https://lucidworks.com/blog/2013/05/09/update-accessing-words-around-a-positional-match-in-lucene-4/

But this kind of approach won't let me use the full power of the Solr
ecosystem.  I'd basically be back to dealing with Lucene directly, which I
think is a step backwards.  I think the right approach is to write my own
SearchComponent, using the highlighter as a starting point.  But I wanted
to make sure there wasn't a simpler way.


On Sun, Jun 5, 2016 at 11:30 AM Ahmet Arslan 
wrote:

> Well debug query has the list of token that caused match.
> If i am not mistaken i read an example about span query and spans thing.
> It was listing the positions of the matches.
> Cannot find the example at the moment..
>
> Ahmet
>
>
>
> On Sunday, June 5, 2016 9:10 PM, Justin Lee 
> wrote:
> Thanks for the responses Alex and Ahmet.
>
> The TermVector component was the first thing I looked at, but what it gives
> you is offset information for every token in the document.  I'm trying to
> get a list of tokens that actually match the search query, and unless I'm
> missing something, the TermVector component doesn't give you that
> information.
>
> The TermSpans class does contain the right information, but again the hard
> part is: how do I reliably get a list of TokenSpans for the tokens that
> actually match the search query?  That's why I ended up in the highlighter
> source code, because the highlighter has to do just this in order to create
> snippets with accurate highlighting.
>
> Justin
>
>
> On Sun, Jun 5, 2016 at 9:09 AM Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > May be org.apache.lucene.search.spans.TermSpans ?
> >
> >
> >
> > On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> > It sounds like TermVector component's output:
> >
> https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
> >
> > Perhaps with additional flags enabled (e.g. tv.offsets and/or
> > tv.positions).
> >
> > Regards,
> >Alex.
> > 
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> >
> > On 5 June 2016 at 07:39, Justin Lee  wrote:
> > > Is anyone aware of a way of getting a list of each matching token and
> > their
> > > offsets after executing a search?  The reason I want to do this is
> > because
> > > I have the physical coordinates of each token in the original document
> > > stored out of band, and I want to be able to highlight in the original
> > > document.  I would really like to have Solr return the list of matching
> > > tokens because then things like stemming and phrase matching will work
> as
> > > expected. I'm thinking of something like the highlighter component,
> > except
> > > instead of returning html, it would return just the matching tokens and
> > > their offsets.
> > >
> > > I have googled high and low and can't seem to find an exact answer to
> > this
> > > question, so I have spent the last few days examining the internals of
> > the
> > > various highlighting classes in Solr and Lucene.  I think the bulk of
> the
> > > action is in WeightedSpanTermExtractor and its interaction with
> > > getBestTextFragments in the Highlighter class.  But before I spend
> > anymore
> > > time on this I thought I'd ask (1) whether anyone knows of an easier
> way
> > of
> > > doing this, and (2) whether I'm at least barking up the right tree.
> > >
> > > Thanks much,
> > > Justin
> >
>


Index time Dates format when time is not needed

2016-06-05 Thread Steven White
Hi everyone,

I'm using "solr.DateRangeField" data type to index my dates data and based
on [1] the format of the dates data is "-MM-DDThh:mm:ssZ".

In my case, I have no need to search on time, just dates.  I started by
indexing my dates data as "2016-06-01" but Solr threw an exception.  I then
changed my code to index the dates data as: "2016-06-01T00:00:00Z" and now
it works.

I have tested this new format and all is well so far, however I'm not sure
if the way I have done the padding is valid.  So, my question to the Solr
community is this: Is the format that I'm using correct (padding with "00")
or is there some other format I should have used that is better and more
optimal for my use case?

Thanks in advanced.

Steve

[1] https://cwiki.apache.org/confluence/display/solr/Working+with+Dates


Re: Getting a list of matching terms and offsets

2016-06-05 Thread Justin Lee
Thanks, yea, I looked at debug query too.  Unfortunately the output of
debug query doesn't quite do it.  For example, if you use a wildcard query,
it will simply explain the score associated with that wildcard query, not
the actual matching token.  In order words, if you search for "hour*" and
the actual matching text is "hours", debug query doesn't tell you that.
Instead, it just reports the score associated with "hour*".

The closest example I've ever found is this:

https://lucidworks.com/blog/2013/05/09/update-accessing-words-around-a-positional-match-in-lucene-4/

But this kind of approach won't let me use the full power of the Solr
ecosystem.  I'd basically be back to dealing with Lucene directly, which I
think is a step backwards.  I think the right approach is to write my own
SearchComponent, using the highlighter as a starting point.  But I wanted
to make sure there wasn't a simpler way.

On Sun, Jun 5, 2016 at 11:30 AM Ahmet Arslan 
wrote:

> Well debug query has the list of token that caused match.
> If i am not mistaken i read an example about span query and spans thing.
> It was listing the positions of the matches.
> Cannot find the example at the moment..
>
> Ahmet
>
>
>
> On Sunday, June 5, 2016 9:10 PM, Justin Lee 
> wrote:
> Thanks for the responses Alex and Ahmet.
>
> The TermVector component was the first thing I looked at, but what it gives
> you is offset information for every token in the document.  I'm trying to
> get a list of tokens that actually match the search query, and unless I'm
> missing something, the TermVector component doesn't give you that
> information.
>
> The TermSpans class does contain the right information, but again the hard
> part is: how do I reliably get a list of TokenSpans for the tokens that
> actually match the search query?  That's why I ended up in the highlighter
> source code, because the highlighter has to do just this in order to create
> snippets with accurate highlighting.
>
> Justin
>
>
> On Sun, Jun 5, 2016 at 9:09 AM Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > May be org.apache.lucene.search.spans.TermSpans ?
> >
> >
> >
> > On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> > It sounds like TermVector component's output:
> >
> https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
> >
> > Perhaps with additional flags enabled (e.g. tv.offsets and/or
> > tv.positions).
> >
> > Regards,
> >Alex.
> > 
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> >
> > On 5 June 2016 at 07:39, Justin Lee  wrote:
> > > Is anyone aware of a way of getting a list of each matching token and
> > their
> > > offsets after executing a search?  The reason I want to do this is
> > because
> > > I have the physical coordinates of each token in the original document
> > > stored out of band, and I want to be able to highlight in the original
> > > document.  I would really like to have Solr return the list of matching
> > > tokens because then things like stemming and phrase matching will work
> as
> > > expected. I'm thinking of something like the highlighter component,
> > except
> > > instead of returning html, it would return just the matching tokens and
> > > their offsets.
> > >
> > > I have googled high and low and can't seem to find an exact answer to
> > this
> > > question, so I have spent the last few days examining the internals of
> > the
> > > various highlighting classes in Solr and Lucene.  I think the bulk of
> the
> > > action is in WeightedSpanTermExtractor and its interaction with
> > > getBestTextFragments in the Highlighter class.  But before I spend
> > anymore
> > > time on this I thought I'd ask (1) whether anyone knows of an easier
> way
> > of
> > > doing this, and (2) whether I'm at least barking up the right tree.
> > >
> > > Thanks much,
> > > Justin
> >
>


Re: Getting a list of matching terms and offsets

2016-06-05 Thread Ahmet Arslan
Well debug query has the list of token that caused match.
If i am not mistaken i read an example about span query and spans thing.
It was listing the positions of the matches.
Cannot find the example at the moment..

Ahmet



On Sunday, June 5, 2016 9:10 PM, Justin Lee  wrote:
Thanks for the responses Alex and Ahmet.

The TermVector component was the first thing I looked at, but what it gives
you is offset information for every token in the document.  I'm trying to
get a list of tokens that actually match the search query, and unless I'm
missing something, the TermVector component doesn't give you that
information.

The TermSpans class does contain the right information, but again the hard
part is: how do I reliably get a list of TokenSpans for the tokens that
actually match the search query?  That's why I ended up in the highlighter
source code, because the highlighter has to do just this in order to create
snippets with accurate highlighting.

Justin


On Sun, Jun 5, 2016 at 9:09 AM Ahmet Arslan 
wrote:

> Hi,
>
> May be org.apache.lucene.search.spans.TermSpans ?
>
>
>
> On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch 
> wrote:
> It sounds like TermVector component's output:
> https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
>
> Perhaps with additional flags enabled (e.g. tv.offsets and/or
> tv.positions).
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
>
> On 5 June 2016 at 07:39, Justin Lee  wrote:
> > Is anyone aware of a way of getting a list of each matching token and
> their
> > offsets after executing a search?  The reason I want to do this is
> because
> > I have the physical coordinates of each token in the original document
> > stored out of band, and I want to be able to highlight in the original
> > document.  I would really like to have Solr return the list of matching
> > tokens because then things like stemming and phrase matching will work as
> > expected. I'm thinking of something like the highlighter component,
> except
> > instead of returning html, it would return just the matching tokens and
> > their offsets.
> >
> > I have googled high and low and can't seem to find an exact answer to
> this
> > question, so I have spent the last few days examining the internals of
> the
> > various highlighting classes in Solr and Lucene.  I think the bulk of the
> > action is in WeightedSpanTermExtractor and its interaction with
> > getBestTextFragments in the Highlighter class.  But before I spend
> anymore
> > time on this I thought I'd ask (1) whether anyone knows of an easier way
> of
> > doing this, and (2) whether I'm at least barking up the right tree.
> >
> > Thanks much,
> > Justin
>


Re: Indexing logs in Solr

2016-06-05 Thread Otis Gospodnetic
You can ship SOLR logs to Logsene or any other log management service and not 
worry too much about their storage/size.

Otis

> On Jun 5, 2016, at 02:08, Anil  wrote:
> 
> Hi ,
> 
> i would like to index logs using to enable search on it in our application.
> 
> The problem would be index and stored size as log files size would go upto
> terabytes.
> 
> is there any way to use highlight feature without storing ?
> 
> i found following link where Benedetti Alessandro mentioned about custom
> highlighter on url field.
> 
> http://lucene.472066.n3.nabble.com/Highlighting-for-non-stored-fields-td1773015.html
> 
> Any ideas would be helpful. Thanks.
> 
> Cheers,
> Anil


Help needed on Solr Streaming Expressions

2016-06-05 Thread Hui Liu
Hi,

  I have Solr 6.0.0 installed on my PC (windows 7), I was 
experimenting with 'Streaming Expression' feature by following steps from this 
link: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions, 
but cannot get it to work, attached is my solrconfig.xml and schema.xml, note I 
do have 'export' handler defined in my 'solrconfig.xml' and enabled all fields 
as 'docvalues' in 'schema.xml'; I am using solr cloud and external zookeeper 
(also installed on m PC), here is the command to start this 2-node Solr cloud 
instance and to create the collection 'document3':

-- start 2-node solr cloud instances:
solr start -c -z 127.0.0.1:2181 -p 8988 -s solr3
solr start -c -z 127.0.0.1:2181 -p 8989 -s solr4

-- create the collection:
solr create -c document3 -d new_doc_configs3 -p 8988 -s 2 -rf 2

  after creating the collection I loaded a few documents using 
'csv' format and I was able to query it using 'curl' command from my PC:

-- this works on my PC:
curl 
http://localhost:8988/solr/document3/select?q=*:*=document_id+desc,sender_msg_dest+desc=document_id,sender_msg_dest,recip_msg_dest

  but when trying Streaming 'search' using curl, it does not work, 
I tried with 3 different options: with zkHost, using 'export', or using 
'select', all getting the same error:

curl: (6) Could not resolve host: sort=document_id asc,qt=
{"result-set":{"docs":[
{"EXCEPTION":null,"EOF":true}]}}

-- different curl commands tried, all getting the same error above:
curl --data-urlencode 
'expr=search(document3,zkHost="127.0.0.1:2181",q="*:*",fl="document_id, 
sender_msg_dest", sort="document_id asc",qt="/export")' 
"http://localhost:8988/solr/document2/stream;

curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id, 
sender_msg_dest", sort="document_id asc",qt="/export")' 
"http://localhost:8988/solr/document2/stream;

curl --data-urlencode 'expr=search(document3,q="*:*",fl="document_id, 
sender_msg_dest", sort="document_id asc",qt="/select",rows=10)' 
"http://localhost:8988/solr/document2/stream;

  what am I doing wrong? Thanks for any help!

Regards,
Hui Liu





  

  
  6.0.0

  
  ${solr.data.dir:}


  
  
   

  
  

  
  


${solr.lock.type:native}


 true
  


  
  
  
  
  
  

  
  



  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}

 

  
   ${solr.autoCommit.maxTime:15000} 
   false 
 


  
   ${solr.autoSoftCommit.maxTime:-1} 
 

  
  
  
  

1024









   



 



true

   
   20

   
   200


false


2

  


  
  
 





  

  
  
  

 
   explicit
   10
 



  
  
 
   explicit
   json
   true
   text
 
  

  

  text

  

  
  


  
  

  
  

 explicit 
 true

  
  


  

  
  

  
  
 
  true
  false
 

  terms

  


  
{!xport}
xsort
false
  
  
query
  


  
  




  
 
 
 
 
 

   

  
  
  
   
   
 
 
 
 
 
 
 
 
 
   
  document_id
  document_id



Re: Getting a list of matching terms and offsets

2016-06-05 Thread Justin Lee
Thanks for the responses Alex and Ahmet.

The TermVector component was the first thing I looked at, but what it gives
you is offset information for every token in the document.  I'm trying to
get a list of tokens that actually match the search query, and unless I'm
missing something, the TermVector component doesn't give you that
information.

The TermSpans class does contain the right information, but again the hard
part is: how do I reliably get a list of TokenSpans for the tokens that
actually match the search query?  That's why I ended up in the highlighter
source code, because the highlighter has to do just this in order to create
snippets with accurate highlighting.

Justin

On Sun, Jun 5, 2016 at 9:09 AM Ahmet Arslan 
wrote:

> Hi,
>
> May be org.apache.lucene.search.spans.TermSpans ?
>
>
>
> On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch 
> wrote:
> It sounds like TermVector component's output:
> https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
>
> Perhaps with additional flags enabled (e.g. tv.offsets and/or
> tv.positions).
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
>
> On 5 June 2016 at 07:39, Justin Lee  wrote:
> > Is anyone aware of a way of getting a list of each matching token and
> their
> > offsets after executing a search?  The reason I want to do this is
> because
> > I have the physical coordinates of each token in the original document
> > stored out of band, and I want to be able to highlight in the original
> > document.  I would really like to have Solr return the list of matching
> > tokens because then things like stemming and phrase matching will work as
> > expected. I'm thinking of something like the highlighter component,
> except
> > instead of returning html, it would return just the matching tokens and
> > their offsets.
> >
> > I have googled high and low and can't seem to find an exact answer to
> this
> > question, so I have spent the last few days examining the internals of
> the
> > various highlighting classes in Solr and Lucene.  I think the bulk of the
> > action is in WeightedSpanTermExtractor and its interaction with
> > getBestTextFragments in the Highlighter class.  But before I spend
> anymore
> > time on this I thought I'd ask (1) whether anyone knows of an easier way
> of
> > doing this, and (2) whether I'm at least barking up the right tree.
> >
> > Thanks much,
> > Justin
>


language configuration in update extract request handler

2016-06-05 Thread SIDDHAST® Roshan
Hi All,

we are using the application for indexing and searching text using
solr. we refered the guide posted
http://hortonworks.com/hadoop-tutorial/indexing-and-searching-text-within-images-with-apache-solr/

Problem: we are want to index hindi images. we want to know how to set
configuration parameter of tesseract via tika or external params

-- 
Roshan Agarwal
Siddhast®
907 chandra vihar colony
Jhansi-284002
M:+917376314900


Re: Getting a list of matching terms and offsets

2016-06-05 Thread Ahmet Arslan
Hi,

May be org.apache.lucene.search.spans.TermSpans ?



On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch  
wrote:
It sounds like TermVector component's output:
https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component

Perhaps with additional flags enabled (e.g. tv.offsets and/or tv.positions).

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/



On 5 June 2016 at 07:39, Justin Lee  wrote:
> Is anyone aware of a way of getting a list of each matching token and their
> offsets after executing a search?  The reason I want to do this is because
> I have the physical coordinates of each token in the original document
> stored out of band, and I want to be able to highlight in the original
> document.  I would really like to have Solr return the list of matching
> tokens because then things like stemming and phrase matching will work as
> expected. I'm thinking of something like the highlighter component, except
> instead of returning html, it would return just the matching tokens and
> their offsets.
>
> I have googled high and low and can't seem to find an exact answer to this
> question, so I have spent the last few days examining the internals of the
> various highlighting classes in Solr and Lucene.  I think the bulk of the
> action is in WeightedSpanTermExtractor and its interaction with
> getBestTextFragments in the Highlighter class.  But before I spend anymore
> time on this I thought I'd ask (1) whether anyone knows of an easier way of
> doing this, and (2) whether I'm at least barking up the right tree.
>
> Thanks much,
> Justin


Re: Stemming Help

2016-06-05 Thread Doug Turnbull
What output are you seeing exactly from the analysis UI?

It's also interesting you're not lowercasing after tokeinzation.
On Sun, Jun 5, 2016 at 10:42 AM Georg Sorst  wrote:

> Without having more context:
>
> How do you know that it is not working?
> What is the output you are getting in the analysis tool?
> Do the analysis steps in the output match your configuration?
> Are you sure you selected the right field / field type before running the
> analysis?
>
> Jamal, Sarfaraz  schrieb am
> Fr., 3. Juni 2016 um 20:12 Uhr:
>
> > Hi Guys,
> >
> > I am following this tutorial:
> >
> >
> http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-solr-2013-08-02/
> >
> > My (Managed) Schema file looks like this: (in the appropriate places)
> >
> >
> > -   > stored="true" />
> >
> > -> positionIncrementGap="100">
> > 
> > 
> > 
> > 
> >   
> >
> >  -   > stored="true" />
> >
> > -
> >
> > I have re-indexed everything -
> >
> > It is not effecting my search at all -
> >
> > - from what I can tell from the analysis tool nothing is happening.
> >
> > Is there something else I am missing or should take a look at, or is it
> > possible to debug this? Or some other documentation I can search though?
> >
> > Thanks!
> >
> > Sas
> >
> > -Original Message-
> > From: Shawn Heisey [mailto:apa...@elyograg.org]
> > Sent: Friday, June 3, 2016 2:02 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: [E] Re: Stemming and Managed Schema
> >
> > On 6/3/2016 9:22 AM, Jamal, Sarfaraz wrote:
> > > I would edit the managed-schema, make my changes, shutdown solr? And
> > > start it back up and verify it is still there?
> >
> > That's the sledgehammer approach.  Simple and effective, but Solr does go
> > offline for a short time.
> >
> > > Or is there another way to reload the core/collection?
> >
> > For SolrCloud:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2
> >
> > For non-cloud mode:
> >
> >
> https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Stemming Help

2016-06-05 Thread Georg Sorst
Without having more context:

How do you know that it is not working?
What is the output you are getting in the analysis tool?
Do the analysis steps in the output match your configuration?
Are you sure you selected the right field / field type before running the
analysis?

Jamal, Sarfaraz  schrieb am
Fr., 3. Juni 2016 um 20:12 Uhr:

> Hi Guys,
>
> I am following this tutorial:
>
> http://thinknook.com/keyword-stemming-and-lemmatisation-with-apache-solr-2013-08-02/
>
> My (Managed) Schema file looks like this: (in the appropriate places)
>
>
> -   stored="true" />
>
> -positionIncrementGap="100">
> 
> 
> 
> 
>   
>
>  -   stored="true" />
>
> -
>
> I have re-indexed everything -
>
> It is not effecting my search at all -
>
> - from what I can tell from the analysis tool nothing is happening.
>
> Is there something else I am missing or should take a look at, or is it
> possible to debug this? Or some other documentation I can search though?
>
> Thanks!
>
> Sas
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Friday, June 3, 2016 2:02 PM
> To: solr-user@lucene.apache.org
> Subject: Re: [E] Re: Stemming and Managed Schema
>
> On 6/3/2016 9:22 AM, Jamal, Sarfaraz wrote:
> > I would edit the managed-schema, make my changes, shutdown solr? And
> > start it back up and verify it is still there?
>
> That's the sledgehammer approach.  Simple and effective, but Solr does go
> offline for a short time.
>
> > Or is there another way to reload the core/collection?
>
> For SolrCloud:
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api2
>
> For non-cloud mode:
>
> https://cwiki.apache.org/confluence/display/solr/CoreAdmin+API#CoreAdminAPI-RELOAD
>
> Thanks,
> Shawn
>
>


Re: Indexing logs in Solr

2016-06-05 Thread SIDDHAST® Roshan
hi anil,

as i know storing is required for highlighting and KWIC in solr.
one way you can do it is do not store logs in solr and retrieve them
directly from file or DB and highlight via regular expression

Roshan


On 6/5/16, Anil  wrote:
> Thanks IIan. I will look into this.
> In our case, logs are attached to some application information and its
> linked to other product information.
>
> Based on the log information, user will be navigated to other features of
> product. So we cannot directly decouple log search from our application.
>
> Thanks,
> Anil
>
> On 5 June 2016 at 11:42, Ilan Schwarts  wrote:
>
>> How about using "logstash" for this? I know its ES and not solr, but it
>> is
>> a free tool that is out there and no need to re-invent the wheel
>> On Jun 5, 2016 9:09 AM, "Anil"  wrote:
>>
>> > Hi ,
>> >
>> > i would like to index logs using to enable search on it in our
>> application.
>> >
>> > The problem would be index and stored size as log files size would go
>> upto
>> > terabytes.
>> >
>> > is there any way to use highlight feature without storing ?
>> >
>> > i found following link where Benedetti Alessandro mentioned about
>> > custom
>> > highlighter on url field.
>> >
>> >
>> >
>> http://lucene.472066.n3.nabble.com/Highlighting-for-non-stored-fields-td1773015.html
>> >
>> > Any ideas would be helpful. Thanks.
>> >
>> > Cheers,
>> > Anil
>> >
>>
>


-- 

Roshan Agarwal
Director sales
Siddhast® Ip innovation (P) ltd
907 chandra vihar colony
Jhansi-284002
M:+917376314900


Re: Indexing logs in Solr

2016-06-05 Thread Anil
Thanks IIan. I will look into this.
In our case, logs are attached to some application information and its
linked to other product information.

Based on the log information, user will be navigated to other features of
product. So we cannot directly decouple log search from our application.

Thanks,
Anil

On 5 June 2016 at 11:42, Ilan Schwarts  wrote:

> How about using "logstash" for this? I know its ES and not solr, but it is
> a free tool that is out there and no need to re-invent the wheel
> On Jun 5, 2016 9:09 AM, "Anil"  wrote:
>
> > Hi ,
> >
> > i would like to index logs using to enable search on it in our
> application.
> >
> > The problem would be index and stored size as log files size would go
> upto
> > terabytes.
> >
> > is there any way to use highlight feature without storing ?
> >
> > i found following link where Benedetti Alessandro mentioned about custom
> > highlighter on url field.
> >
> >
> >
> http://lucene.472066.n3.nabble.com/Highlighting-for-non-stored-fields-td1773015.html
> >
> > Any ideas would be helpful. Thanks.
> >
> > Cheers,
> > Anil
> >
>


Re: Indexing logs in Solr

2016-06-05 Thread Ilan Schwarts
How about using "logstash" for this? I know its ES and not solr, but it is
a free tool that is out there and no need to re-invent the wheel
On Jun 5, 2016 9:09 AM, "Anil"  wrote:

> Hi ,
>
> i would like to index logs using to enable search on it in our application.
>
> The problem would be index and stored size as log files size would go upto
> terabytes.
>
> is there any way to use highlight feature without storing ?
>
> i found following link where Benedetti Alessandro mentioned about custom
> highlighter on url field.
>
>
> http://lucene.472066.n3.nabble.com/Highlighting-for-non-stored-fields-td1773015.html
>
> Any ideas would be helpful. Thanks.
>
> Cheers,
> Anil
>


Indexing logs in Solr

2016-06-05 Thread Anil
Hi ,

i would like to index logs using to enable search on it in our application.

The problem would be index and stored size as log files size would go upto
terabytes.

is there any way to use highlight feature without storing ?

i found following link where Benedetti Alessandro mentioned about custom
highlighter on url field.

http://lucene.472066.n3.nabble.com/Highlighting-for-non-stored-fields-td1773015.html

Any ideas would be helpful. Thanks.

Cheers,
Anil