Re: New comer - Benoit Vanalderweireldt

2016-02-25 Thread Erick Erickson
There are also other ways to help than coding. Documentation, Java docs,
writing test cases (look at the code coverage reports and pick something
not already covered). Review or comment on patches, work on the new Angular
JS UI.

Try installing Solr and note any ambiguous docs and suggest better ones,
the sky is the limit.

There's a chronic need for better JavaDocs.

In short, pick something about Solr/Lucene that bugs you and see what you
can do to improve it ;)

Welcome!
Erick
On Feb 26, 2016 14:07, "Shawn Heisey"  wrote:

> On 2/25/2016 4:34 PM, Benoit Vanalderweireldt wrote:
> > I have just joined this mailing list, I would love to contribute to
> Apache SOLR (I am a certified Java developer OCA and OCP)
> >
> > Can someone guide me and assign me a first task on Jira (my username is
> : b.vanalderweireldt) ?
>
> Thanks for stepping up and offering to help out!  Jan has given you some
> good starting points.  I had mostly written this message before that
> reply came through, so here's some more info:
>
> You'll want to join the dev list.  Most of the communication for a
> specific issue will happen in Jira, but the dev list offers a place for
> larger and miscellaneous discussions.  Warning: Because all Jira
> activity is sent to the dev list, it is a high-traffic list.  Having the
> ability to use filters on your email to direct messages to different
> folders is a life-saver.
>
> Your initial message would have been more at home on the dev list, but
> we're not terribly formal about enforcing that kind of separation.
> Initial discussion for many issues is welcome on this list, and often
> preferred before going to Jira.
>
> Normally issues are assigned to the committer that agrees to take on the
> change and commit it.
>
> Take a look at the many open issues on Solr.  You'll probably want to
> start with an issue that's recently filed, not one that was filed years
> ago.  After you become more comfortable with the codebase and the
> process, you'll be in a better position to tackle older issues.
>
>
> https://issues.apache.org/jira/browse/SOLR/?selectedTab=com.atlassian.jira.jira-projects-plugin:issues-panel
>
> A highly filtered and likely more relevant list:
>
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20labels%20in%20%28beginners%2C%20newdev%29%20AND%20status%20not%20in%20%28resolved%2C%20closed%29
>
> Thanks,
> Shawn
>
>


Re: importing 4.10.2 solr cloud repository to 5.4.1

2016-02-25 Thread Neeraj Bhatt
Thanks Shawan. My index meets Atomic update req, so I want to use DIH
because of its convenience
I am in a solr cloud with 5 shards (with a separate zookeeper
ensemble), so I will have to put 5 entity tags so that i can give 5
diff urls , one for each shard ?

thanks
neeraj


On Wed, Feb 24, 2016 at 7:34 PM, Shawn Heisey  wrote:
> On 2/23/2016 11:10 PM, Neeraj Bhatt wrote:
>> Hello
>>
>> We have a solr cloud stored and indexed data of around 25 lakh documents
>> We recently moved to solr 5.4.1 but are unable to move our indexed
>> data. What approach we should follow
>>
>> 1. data import handler works in solr cloud ? what should we give in
>> url like  url="http://192.168.34.218:8080/solr/client_sku_shard1_replica3";
>> , this will have shard name, so all documents won't be imported
>
> SolrEntityProcessor in DIH will only work if your index meets the
> requirements for Atomic Updates.  Basically, every field must be stored,
> unless it is a copyField destination:
>
> https://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations
>
>> 2. direct copying of index will work ? There are some schema changes
>> like from solr.Int to solr.TrieInt etc
>
> If the schema uses different classes, you will not be able to use the
> old index directly.  The schema would need to be completely unchanged,
> but it sounds like your old schema is using classes that are no longer
> present in 5.x.
>
>> 3. write code to fetch from solr 4.10.2 and push into 5.4.1 this is
>> time consuming, though can be improved by using multithreading
>
> This has the same requirements as SolrEntityProcessor.
>
> A complete reindex in 5.x from the original data source would be the
> best option, but if your index meets the Atomic Update requirements, you
> could go with one of the options that you numbered 1 or 3.
>
> Thanks,
> Shawn
>


Re: New comer - Benoit Vanalderweireldt

2016-02-25 Thread Shawn Heisey
On 2/25/2016 4:34 PM, Benoit Vanalderweireldt wrote:
> I have just joined this mailing list, I would love to contribute to Apache 
> SOLR (I am a certified Java developer OCA and OCP)
>
> Can someone guide me and assign me a first task on Jira (my username is : 
> b.vanalderweireldt) ?

Thanks for stepping up and offering to help out!  Jan has given you some
good starting points.  I had mostly written this message before that
reply came through, so here's some more info:

You'll want to join the dev list.  Most of the communication for a
specific issue will happen in Jira, but the dev list offers a place for
larger and miscellaneous discussions.  Warning: Because all Jira
activity is sent to the dev list, it is a high-traffic list.  Having the
ability to use filters on your email to direct messages to different
folders is a life-saver.

Your initial message would have been more at home on the dev list, but
we're not terribly formal about enforcing that kind of separation. 
Initial discussion for many issues is welcome on this list, and often
preferred before going to Jira.

Normally issues are assigned to the committer that agrees to take on the
change and commit it.

Take a look at the many open issues on Solr.  You'll probably want to
start with an issue that's recently filed, not one that was filed years
ago.  After you become more comfortable with the codebase and the
process, you'll be in a better position to tackle older issues.

https://issues.apache.org/jira/browse/SOLR/?selectedTab=com.atlassian.jira.jira-projects-plugin:issues-panel

A highly filtered and likely more relevant list:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20SOLR%20AND%20labels%20in%20%28beginners%2C%20newdev%29%20AND%20status%20not%20in%20%28resolved%2C%20closed%29

Thanks,
Shawn



Re: New comer - Benoit Vanalderweireldt

2016-02-25 Thread Jan Høydahl
Hi

Welcome and thanks for volunteering.
Here’s our guide for newcomers:

http://wiki.apache.org/solr/HowToContribute
Also see the web site http://lucene.apache.org/solr/resources.html#community

After reading this, please feel welcome to come back to ask further questions!
When it comes to assigning tasks in JIRA, we’re not that formal :) You can 
simply look at the open issues (start with those tagged “newdev”), and when you 
find something you like to work with, write a comment that you want to start 
looking at it. Once you have something, put up a patch (or a GitHub PR), 
receive feedback and iterate until it is ready for commit.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 26. feb. 2016 kl. 00.34 skrev Benoit Vanalderweireldt :
> 
> Dear solr users community,
> 
> I have just joined this mailing list, I would love to contribute to Apache 
> SOLR (I am a certified Java developer OCA and OCP)
> 
> Can someone guide me and assign me a first task on Jira (my username is : 
> b.vanalderweireldt) ?
> 
> Cheers
> 
> Benoit
> 



Re: Query time de-boost

2016-02-25 Thread Jack Krupansky
0.1 is a fractional boost - all intra-query boosts are multiplicative, not
additive, so term^0.1 reduces the term by 90%.

-- Jack Krupansky

On Wed, Feb 24, 2016 at 11:29 AM, shamik  wrote:

> Binoy, 0.1 is still a positive boost. With title getting the highest
> weight,
> this won't make any difference. I've tried this as well.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Query-time-de-boost-tp4259309p4259552.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Query time de-boost

2016-02-25 Thread Walter Underwood
Another approach is to boost everything but that content.

This bq should work:

*:* -ContentGroup:”Developer’s Documentation”

Or a function query in the boost parameter, with an if statement.

Or make ContentGroup an enum with different values for each group, and use a 
function query to boost by that value. 

I haven’t tried any of these, of course.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 25, 2016, at 3:33 PM, Binoy Dalal  wrote:
> 
> According to the edismax documentation, negative boosts are supported, so
> you should certainly give it a try.
> https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser
> 
> On Fri, 26 Feb 2016, 03:45 shamik  wrote:
> 
>> Emir, I don't Solr supports a negative boosting *^-99* syntax like this. I
>> can certainly do something like:
>> 
>> bq=(*:* -ContetGroup:"Developer's Documentation")^99 , but then I can't
>> have
>> my other bq parameters.
>> 
>> This doesn't work --> bq=Source:simplecontent^10 Source:Help^20 (*:*
>> -ContetGroup:"Developer's Documentation")^99
>> 
>> Are you sure something like *bq=ContenGroup-local:Developer^-99* worked for
>> you?
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Query-time-de-boost-tp4259309p4259879.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> -- 
> Regards,
> Binoy Dalal



New comer - Benoit Vanalderweireldt

2016-02-25 Thread Benoit Vanalderweireldt
Dear solr users community,

I have just joined this mailing list, I would love to contribute to Apache SOLR 
(I am a certified Java developer OCA and OCP)

Can someone guide me and assign me a first task on Jira (my username is : 
b.vanalderweireldt) ?

Cheers

Benoit



Re: Query time de-boost

2016-02-25 Thread Binoy Dalal
According to the edismax documentation, negative boosts are supported, so
you should certainly give it a try.
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

On Fri, 26 Feb 2016, 03:45 shamik  wrote:

> Emir, I don't Solr supports a negative boosting *^-99* syntax like this. I
> can certainly do something like:
>
> bq=(*:* -ContetGroup:"Developer's Documentation")^99 , but then I can't
> have
> my other bq parameters.
>
> This doesn't work --> bq=Source:simplecontent^10 Source:Help^20 (*:*
> -ContetGroup:"Developer's Documentation")^99
>
> Are you sure something like *bq=ContenGroup-local:Developer^-99* worked for
> you?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Query-time-de-boost-tp4259309p4259879.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal


Re: Why QueryWeight with Custom Similarity

2016-02-25 Thread Markus, Sascha
Sorry clicked send to early :-)

with the above additional code the calculation is done by the default
similarity and the behaviour is as expected.

I think this is an issue of the implementation but I didn't find one in
jira. Should I create one?

Cheers
 Sascha


On Thu, Feb 25, 2016 at 11:12 PM, Markus, Sascha 
wrote:

> Hi,
> I finally found the source of the problem I'm having with the custom
> similarity.
>
> The setting:
> - Solr 5.4.1
> - the SpecialSimilarity extends ClassicSimilarity
> - for one field this similarity is configured. Everything else uses
> ClassicSimilarity because of  class="solr.SchemaSimilarityFactory"/>
>
> Result:
> - most calculation is done by the correct similarity for each field
> - but the method public float queryNorm(float valueForNormalization) is
> always called on the base class Similarity which always returns 1f
>
> My workaround:
> I changed PerFieldSimilarityWrapper and added
> @Override
>   public float queryNorm(float valueForNormalization) {
> return get("").queryNorm(valueForNormalization);
>   }
>
>
>
>
> On Mon, Feb 15, 2016 at 10:28 AM, Markus, Sascha 
> wrote:
>
>> Hi,
>> I created a custom similarity and factory which extends
>> DefaultSimilarity/-Factory to have
>>
>> to achive this I my similarity overwrites idfExplain like this and also
>> the method for an array of terms.
>> public Explanation idfExplain(CollectionStatistics collectionStats,
>> TermStatistics termStats) {
>> final long df = lookUpDocumentFrequency(termStats);
>> final long max = getDocumentCountForIDF();
>> final float idf = idf(df, max);
>> log.debug("term:" + termStats.term() + " idf(docFreq=" + df + ",
>> maxDocs=" + max + ") -> " + idf);
>> return Explanation.match(idf, "idf(docFreq=" + df + ", maxDocs="
>> + max + ")");
>> }
>>
>> I configured my similarity for one field in the schema.
>>
>> Without my plugin the score just uses the fieldWeight.
>> But when my similarity is enabled scores are calculated with the
>> fieldWeight multiplied by a queryWeight.
>> And this is done for ALL FIELDS queried, not only the one with my
>> similarity.
>>
>> Why does this happen and is there a possibility to get around this.
>> From a solr point of view this is probably ok because the score is not
>> meant to be absolute.
>> But in our application a user may set a threshold which is used as a
>> filter query like {!frange l=0.31}query($q).
>>
>> Any hints?
>>
>> Cheers,
>>  Sascha
>>
>>
>>
>>
>


Re: Query time de-boost

2016-02-25 Thread shamik
Emir, I don't Solr supports a negative boosting *^-99* syntax like this. I
can certainly do something like:

bq=(*:* -ContetGroup:"Developer's Documentation")^99 , but then I can't have
my other bq parameters.

This doesn't work --> bq=Source:simplecontent^10 Source:Help^20 (*:*
-ContetGroup:"Developer's Documentation")^99

Are you sure something like *bq=ContenGroup-local:Developer^-99* worked for
you?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-time-de-boost-tp4259309p4259879.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Why QueryWeight with Custom Similarity

2016-02-25 Thread Markus, Sascha
Hi,
I finally found the source of the problem I'm having with the custom
similarity.

The setting:
- Solr 5.4.1
- the SpecialSimilarity extends ClassicSimilarity
- for one field this similarity is configured. Everything else uses
ClassicSimilarity because of 

Result:
- most calculation is done by the correct similarity for each field
- but the method public float queryNorm(float valueForNormalization) is
always called on the base class Similarity which always returns 1f

My workaround:
I changed PerFieldSimilarityWrapper and added
@Override
  public float queryNorm(float valueForNormalization) {
return get("").queryNorm(valueForNormalization);
  }




On Mon, Feb 15, 2016 at 10:28 AM, Markus, Sascha 
wrote:

> Hi,
> I created a custom similarity and factory which extends
> DefaultSimilarity/-Factory to have
>
> to achive this I my similarity overwrites idfExplain like this and also
> the method for an array of terms.
> public Explanation idfExplain(CollectionStatistics collectionStats,
> TermStatistics termStats) {
> final long df = lookUpDocumentFrequency(termStats);
> final long max = getDocumentCountForIDF();
> final float idf = idf(df, max);
> log.debug("term:" + termStats.term() + " idf(docFreq=" + df + ",
> maxDocs=" + max + ") -> " + idf);
> return Explanation.match(idf, "idf(docFreq=" + df + ", maxDocs=" +
> max + ")");
> }
>
> I configured my similarity for one field in the schema.
>
> Without my plugin the score just uses the fieldWeight.
> But when my similarity is enabled scores are calculated with the
> fieldWeight multiplied by a queryWeight.
> And this is done for ALL FIELDS queried, not only the one with my
> similarity.
>
> Why does this happen and is there a possibility to get around this.
> From a solr point of view this is probably ok because the score is not
> meant to be absolute.
> But in our application a user may set a threshold which is used as a
> filter query like {!frange l=0.31}query($q).
>
> Any hints?
>
> Cheers,
>  Sascha
>
>
>
>


Re: Stopping Solr JVM on OOM

2016-02-25 Thread CP Mishra
Solr & Lucene dev folks must be catching Throwable for a reason. Anyway, I
am asking for solutions that I can use.

On Thu, Feb 25, 2016 at 3:06 PM, Fuad Efendi  wrote:

> The best practice: do not ever try to catch Throwable or its descendants
> Error, VirtualMachineError, OutOfMemoryError, and etc.
>
> Never ever.
>
> Also, do not swallow InterruptedException in a loop.
>
> Few simple rules to avoid hanging application. If we follow these, there
> will be no question "what is the best way to stop Solr when it gets in OOM”
> (or just becomes irresponsive because of swallowed exceptions)
>
>
> --
> Fuad Efendi
> 416-993-2060(cell)
>
> On February 25, 2016 at 2:37:45 PM, CP Mishra (mishr...@gmail.com) wrote:
>
> Looking at the previous threads (and in our tests), oom script specified
> at
> command line does not work as OOM exception is trapped and converted to
> RuntimeException. So, what is the best way to stop Solr when it gets in
> OOM
> state? The only way I see is to override multiple handlers and do
> System.exit() from there. Is there a better way?
>
> We are using Solr with default Jetty container.
>
> Thanks,
> CP Mishra
>
>


Re: Stopping Solr JVM on OOM

2016-02-25 Thread Fuad Efendi
The best practice: do not ever try to catch Throwable or its descendants Error, 
VirtualMachineError, OutOfMemoryError, and etc. 

Never ever.

Also, do not swallow InterruptedException in a loop.

Few simple rules to avoid hanging application. If we follow these, there will 
be no question "what is the best way to stop Solr when it gets in OOM” (or just 
becomes irresponsive because of swallowed exceptions)


-- 
Fuad Efendi
416-993-2060(cell)

On February 25, 2016 at 2:37:45 PM, CP Mishra (mishr...@gmail.com) wrote:

Looking at the previous threads (and in our tests), oom script specified at  
command line does not work as OOM exception is trapped and converted to  
RuntimeException. So, what is the best way to stop Solr when it gets in OOM  
state? The only way I see is to override multiple handlers and do  
System.exit() from there. Is there a better way?  

We are using Solr with default Jetty container.  

Thanks,  
CP Mishra  


Stopping Solr JVM on OOM

2016-02-25 Thread CP Mishra
Looking at the previous threads (and in our tests), oom script specified at
command line does not work as OOM exception is trapped and converted to
RuntimeException. So, what is the best way to stop Solr when it gets in OOM
state?  The only way I see is to override multiple handlers and do
System.exit() from there. Is there a better way?

We are using Solr with default Jetty container.

Thanks,
CP Mishra


Storage of internal value

2016-02-25 Thread Jens Ivar Jørdre
Hi all

I am looking for ways of having the functionality of 
https://issues.apache.org/jira/browse/SOLR-1997 
 on Solr 5.X. Is there an 
alternate way to achieve this rather than creating the field type suggested by 
SOLR-1997? If not possible would you be able to suggest references for adding 
custom field types to Solr 5.X?

--
Sincerely,
Jens Ivar Jørdre
about.me/jijordre 

Timeout connecting to index replication master causing slave core failure (0 documents).

2016-02-25 Thread Russell McOrmond
We are running "5.4.0 1718046 - upayavira - 2015-12-04 23:16:46" on a
series of index replication slaves of a single master.

The master is behind a VPN connection to a slower network.  There are times
when that network might have timouts, and we need our applications to be
robust against that type of temporary problem.

Yesterday we had a timeout that lead to a Solr slave having 0 documents in
its core on a production machine, which is pretty serious for us.


Some relevant logs:

root@havarti:/data/solr/solr5-var/logs# grep " ERROR " solr.log.*
solr.log.2:2016-02-24 16:52:52.832 ERROR (indexFetcher-12-thread-1) [
x:cap2015] o.a.s.h.IndexFetcher Master at:
http://localname-removed:8983/solr/cap2015 is not available. Index fetch
failed. Exception: IOException occured when talking to server at:
http://localname-removed:8983/solr/cap2015
root@havarti:/data/solr/solr5-var/logs# grep -vE "(\/select|\/replication)"
solr.log.2
2016-02-24 16:52:52.832 ERROR (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.h.IndexFetcher Master at: http://localname-removed:8983/solr/cap2015
is not available. Index fetch failed. Exception: IOException occured when
talking to server at: http://localname-removed:8983/solr/cap2015
2016-02-24 16:53:52.093 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.u.DirectUpdateHandler2 start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
2016-02-24 16:53:52.203 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.c.SolrDeletionPolicy SolrDeletionPolicy.onCommit: commits: num=2

commit{dir=NRTCachingDirectory(MMapDirectory@/data/solr/solr5-var/data/cap2015/data/index.2016000052555
lockFactory=org.apache.lucene.store.NativeFSLockFactory@4b8e51d2;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_ufj,generation=39439}

commit{dir=NRTCachingDirectory(MMapDirectory@/data/solr/solr5-var/data/cap2015/data/index.2016000052555
lockFactory=org.apache.lucene.store.NativeFSLockFactory@4b8e51d2;
maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_ufk,generation=39440}
2016-02-24 16:53:52.203 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.c.SolrDeletionPolicy newest commit generation = 39440
2016-02-24 16:53:52.215 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.s.SolrIndexSearcher Opening Searcher@4c16fd6[cap2015] main
2016-02-24 16:53:52.216 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.u.DirectUpdateHandler2 end_commit_flush
2016-02-24 16:53:52.216 INFO
 (searcherExecutor-7-thread-1-processing-x:cap2015) [   x:cap2015]
o.a.s.c.QuerySenderListener QuerySenderListener sending requests to
Searcher@4c16fd6[cap2015]
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
2016-02-24 16:53:52.216 INFO
 (searcherExecutor-7-thread-1-processing-x:cap2015) [   x:cap2015]
o.a.s.c.QuerySenderListener QuerySenderListener done.
2016-02-24 16:53:52.216 INFO
 (searcherExecutor-7-thread-1-processing-x:cap2015) [   x:cap2015]
o.a.s.c.SolrCore [cap2015] Registered new searcher Searcher@4c16fd6[cap2015]
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
root@havarti:/data/solr/solr5-var/logs#


Is there some setting I am missing to prevent this problem?  This doesn't
appear to be normal behaviour, and is something I want to ensure that is
prevented.


After spending over 24hours replicating the index was loaded, only to have
the index attempt to do a complete copy again a few minutes later after
getting a similar temporary timout error.


oot@havarti:/data/solr/solr5-var/logs# grep -vE
"(\/select|\/replication|Slave in sync with master)" solr.log
2016-02-25 15:47:56.562 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.h.IndexFetcher Total time taken for download
(fullCopy=true,bytesDownloaded=98312405389) : 80512 secs (1221090
bytes/sec) to 
NRTCachingDirectory(MMapDirectory@/data/solr/solr5-var/data/cap2015/data/index.20160224172604265
lockFactory=org.apache.lucene.store.NativeFSLockFactory@44b0c913;
maxCacheMB=48.0 maxMergeSizeMB=4.0)
2016-02-25 15:47:56.667 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.h.IndexFetcher New index installed. Updating index properties...
index=index.20160224172604265
2016-02-25 15:47:56.685 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.h.IndexFetcher removing old index directory
NRTCachingDirectory(MMapDirectory@/data/solr/solr5-var/data/cap2015/data/index.2016000052555
lockFactory=org.apache.lucene.store.NativeFSLockFactory@44b0c913;
maxCacheMB=48.0 maxMergeSizeMB=4.0)
2016-02-25 15:47:56.685 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.u.DefaultSolrCoreState Rollback old IndexWriter... core=cap2015
2016-02-25 15:47:56.690 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.c.SolrCore New index directory detected:
old=/data/solr/solr5-var/data/cap2015/data/index.2016000052555
new=/data/solr/solr5-var/data/cap2015/data/index.20160224172604265
2016-02-25 15:47:56.775 INFO  (indexFetcher-12-thread-1) [   x:cap2015]
o.a.s.c.SolrDeletio

Re: Solr 6.0

2016-02-25 Thread Renaud Delbru

Hi Shawn,

On 25/02/16 14:07, Shawn Heisey wrote:

The CDCR functionality is currently present in the master branch, but I
do not know for sure whether it will be included in the 6.0 release.  I
am not involved with that feature and have no idea how stable the code is.
CDCR is stable and is running now for months in a large production 
deployment without any known issues.
Erick, who took care of committing it into the trunk, was planning to 
release it as part of 6.0.

--
Renaud Delbru


Re: Solr 6.0

2016-02-25 Thread Yonik Seeley
On Thu, Feb 25, 2016 at 9:07 AM, Shawn Heisey  wrote:
> http://yonik.com/solr-6/

For those of you in the NYC area, I'm giving a talk soon on Solr 6
(and depending on the timing, "Preview" could turn into "Overview" :-)
NYC Apache Lucene/Solr Meetup
Solr 6 Feature Preview
Wednesday, March 9, 2016 6:30 PM
http://www.meetup.com/NYC-Apache-Lucene-Solr-Meetup/events/227839804/

> I am slightly amused to find that either Yonik's blog has no search
> functionality, or the link to it is missing.

Heh...
Google already does a fantastic job at public web search. "CDCR +site:yonik.com"

> I was trying to find the
> CDCR blog post:
>
> http://yonik.com/solr-cross-data-center-replication/
>
> The CDCR functionality is currently present in the master branch, but I
> do not know for sure whether it will be included in the 6.0 release.  I
> am not involved with that feature and have no idea how stable the code is.

My involvement was mostly limited to writing that initial design, and
I think the implementation ended up diverging.
There is CDCR code in trunk, so it will be "released" with 6.0
regardless of it's state.

-Yonik


Suggester in SOLR 5.4.2

2016-02-25 Thread jori.gielis
Hi all,

In our setup we are using SOLR for regular search and suggestions.
For the auto complete function we are using SuggestComponent.

In our search index file it is possible to have titles with the same name.
This works as expected for search because every title has a different
subtitle and is therefore unique. However for the auto complete function we
would like to see single term instead of multiple terms (titles). 

For example:
in auto complete below we search for "top"
Result: 
Top Gear: 3 hits
Topsy and Tim: 1 hit


0
280

build



4


*Top Gear*
0



*Top Gear*
0



*Top Gear*
0



*Topsy and Tim*
0




We tried Terms component before but it isn't possible to use any
tokenfilters to format the terms in this component.

How can we configure the SuggestComponent to show only 1 hit for every
unique term (Titles)?
The search index file should remain the same with duplicate title names.

Any ideas?

Thanks in advance,

Jori



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggester-in-SOLR-5-4-2-tp4259784.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 6.0

2016-02-25 Thread Shawn Heisey
On 2/25/2016 6:32 AM, Steven White wrote:
> Where can I learn more about the upcoming Solr 6.0?  I understand the
> release date cannot be know, but I hope the features and how it difference
> from 5.x is known.

Yonik (creator of Solr) has a blog post about features in the upcoming
version:

http://yonik.com/solr-6/

I am slightly amused to find that either Yonik's blog has no search
functionality, or the link to it is missing.  I was trying to find the
CDCR blog post:

http://yonik.com/solr-cross-data-center-replication/

The CDCR functionality is currently present in the master branch, but I
do not know for sure whether it will be included in the 6.0 release.  I
am not involved with that feature and have no idea how stable the code is.

You can also look at CHANGES.txt in the master git branch.  When the 6x
branch is created, there's always the possibility that some features
will be removed before release.

https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=blob_plain;f=solr/CHANGES.txt;hb=HEAD

Thanks,
Shawn



Re: (Solr 5.5) How do beginners modify dynamic schema now that it is default?

2016-02-25 Thread Jan Høydahl
Good point.

We should definitely aim for GUI support for adding field types.
Perhaps also support a text-field where people can copy-paste a Schema JSON 
command, e.g. “add-field-type”, that will be processed just as if it was 
POST’ed. A cool extra feature would be to detect whether people paste in a 
schema xml fragment, and auto convert it to the corresponding REST JSON command 
to add. That way, people can have a tool to quickly convert old field types 
they may have into curl commands…

Also support in UI to edit all of this would also be handy.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. feb. 2016 kl. 04.38 skrev Alexandre Rafalovitch :
> 
> Hi,
> 
> In Solr 5.5, all the shipped examples now use dynamic schema. So, how
> are they expected to add new types? We have "add/delete fields" UI in
> the new Admin UI, but not "add/delete types".
> 
> Do we expect them to use REST end points and curl? Or to not modify
> types at all? Or edit the "do not edit" managed schema?
> 
> I admit being a bit confused about the beginner's path now. Could
> somebody else - more familiar with the context - comment, please!
> 
> Thank you,
>   Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/



Re: both way synonyms with ManagedSynonymFilterFactory

2016-02-25 Thread Jan Høydahl
Created https://issues.apache.org/jira/browse/SOLR-8737 to handle this

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 22. feb. 2016 kl. 11.21 skrev Jan Høydahl :
> 
> Hi
> 
> Did you get any Further with this?
> I reproduced your situation with Solr 5.5.
> 
> Think the issue here is that when the SynonymFilter is created based on the 
> managed map, option “expand” is always set to “false”, while the default for 
> file-based synonym dictionary is “true”.
> 
> So with expand=false, what happens is that the input word (e.g. “mb”) is 
> *replaced* with the synonym “megabytes”. Confusingly enough, when synonyms 
> are applied both on index and query side, your document will contain 
> “megabytes” instead of “mb”, but when you query for “mb”, the same happens on 
> query side, so you will actually match :-)
> 
> I think what we need is to switch default to expand=true, and make it 
> configurable also in the managed factory.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 11. feb. 2016 kl. 10.16 skrev Bjørn Hjelle :
>> 
>> Hi,
>> 
>> one-way managed synonyms seems to work fine, but I cannot make both-way
>> synonyms work.
>> 
>> Steps to reproduce with Solr 5.4.1:
>> 
>> 1. create a core:
>> $ bin/solr create_core -c test -d server/solr/configsets/basic_configs
>> 
>> 2. edit schema.xml so fieldType text_general looks like this:
>> 
>>   > positionIncrementGap="100">
>> 
>>   
>>   > />
>>   
>> 
>>   
>> 
>> 3. reload the core:
>> 
>> $ curl -X GET "
>> http://localhost:8983/solr/admin/cores?action=RELOAD&core=test";
>> 
>> 4. add synonyms, one one-way synonym, one two-way, reload the core again:
>> 
>> $ curl -X PUT -H 'Content-type:application/json' --data-binary
>> '{"mad":["angry","upset"]}' "
>> http://localhost:8983/solr/test/schema/analysis/synonyms/english";
>> $ curl -X PUT -H 'Content-type:application/json' --data-binary
>> '["mb","megabytes"]' "
>> http://localhost:8983/solr/test/schema/analysis/synonyms/english";
>> $ curl -X GET "
>> http://localhost:8983/solr/admin/cores?action=RELOAD&core=test";
>> 
>> 5. list the synonyms:
>> {
>> "responseHeader":{
>>   "status":0,
>>   "QTime":0},
>> "synonymMappings":{
>>   "initArgs":{"ignoreCase":false},
>>   "initializedOn":"2016-02-11T09:00:50.354Z",
>>   "managedMap":{
>> "mad":["angry",
>>   "upset"],
>> "mb":["megabytes"],
>> "megabytes":["mb"]}}}
>> 
>> 
>> 6. add two documents:
>> 
>> $ bin/post -c test -type 'application/json' -d '[{"id" : "1", "title_t" :
>> "10 megabytes makes me mad" },{"id" : "2", "title_t" : "100 mb should be
>> sufficient" }]'
>> $ bin/post -c test -type 'application/json' -d '[{"id" : "2", "title_t" :
>> "100 mb should be sufficient" }]'
>> 
>> 7. search for the documents:
>> 
>> - all these return the first document, so one-way synonyms work:
>> $ curl -X GET "
>> http://localhost:8983/solr/test/select?q=title_t:angry&indent=true";
>> $ curl -X GET "
>> http://localhost:8983/solr/test/select?q=title_t:upset&indent=true";
>> $ curl -X GET "
>> http://localhost:8983/solr/test/select?q=title_t:mad&indent=true";
>> 
>> - this only returns the document with "mb":
>> 
>> $ curl -X GET "
>> http://localhost:8983/solr/test/select?q=title_t:mb&indent=true";
>> 
>> - this only returns the document with "megabytes"
>> 
>> $ curl -X GET "
>> http://localhost:8983/solr/test/select?q=title_t:megabytes&indent=true";
>> 
>> 
>> Any input on how to make this work would be appreciated.
>> 
>> Thanks,
>> Bjørn
> 



Solr 6.0

2016-02-25 Thread Steven White
Hi,

Where can I learn more about the upcoming Solr 6.0?  I understand the
release date cannot be know, but I hope the features and how it difference
from 5.x is known.

Thank you

Steve


faceting on correlated multi-valued fields?

2016-02-25 Thread Andreas Hubold

Hi,

I'm thinking about indexing articles with tags in a denormalized way as 
follows





multiValued="true"/>
stored="false" multiValued="true"/>


An article can have multiple tags. Each tag has a description and an ID. 
The multi-valued fields tagIds and tagDescriptions have the same length 
and order (tagDescriptions[5] is the description of the tag with ID 
tagId[5]).


Is there a good way to get the IDs of tags with some description (query 
on tagDescriptions) that are used for articles matching some query on 
field text?
I thought about faceting on tagIds but I don't know how to restrict the 
result to the ids with a given description.


Or would you use a different index schema for this use-case?

I'm still using Solr 4.10.4. Is this something that can be done more 
easily with newer versions?


Thanks for any hints!

Cheers,
Andreas





Re: WhitespaceTokenizerFactory and PathHierarchyTokenizerFactory

2016-02-25 Thread Anil
HI,

search can be any free text or ip address or path and  Special characters
should not be treated as text delimiters.

10.20 must return 10.20.30.112
/var/log must return /var/log/bigdata

Please let me know if you need any additional details. Thanks.

Regards,
Anil

On 25 February 2016 at 18:20, Jack Krupansky 
wrote:

> You still haven't stated exactly what your query requirements are. In Solr
> you should always start with an analysis of how people will expect to query
> the data and then work backwards to how to store and index the data to
> achieve the desired queries.
>
> Note that the standard tokenizer will tokenize all of the elements of a
> path or IP as separate terms. Ditto for a query, so you can effectively do
> bth keyword and phrase queries to match individual terms (e.g., path
> elements) or phrases or sequences of path elements or IP address
> components.
>
> -- Jack Krupansky
>
> On Thu, Feb 25, 2016 at 12:41 AM, Anil  wrote:
>
> > Sorry Jack for confusion.
> >
> > I have field which holds free text. text can contain path , ip or any
> free
> > text.
> >
> > I would like to tokenize the text of the field using white space. if the
> > text token is of path or ip pattern , it has be tockenized like path
> > hierarchy way.
> >
> >
> > Regards,
> > Anil
> >
> > On 24 February 2016 at 21:59, Jack Krupansky 
> > wrote:
> >
> > > Your statement makes no sense. Please clarify. Express your
> > requirement(s)
> > > in plain English first before dragging in possible solutions.
> > Technically,
> > > path elements can have embedded spaces.
> > >
> > > -- Jack Krupansky
> > >
> > > On Wed, Feb 24, 2016 at 6:53 AM, Anil  wrote:
> > >
> > > > HI,
> > > >
> > > > i need to use both WhitespaceTokenizerFactory and
> > > > PathHierarchyTokenizerFactory for use case.
> > > >
> > > > Solr supports only one tokenizer. is there any way we can achieve
> > > > PathHierarchyTokenizerFactory  functionality with filters ?
> > > >
> > > > Please advice.
> > > >
> > > > Regards,
> > > > Anil
> > > >
> > >
> >
>


Re: WhitespaceTokenizerFactory and PathHierarchyTokenizerFactory

2016-02-25 Thread Jack Krupansky
You still haven't stated exactly what your query requirements are. In Solr
you should always start with an analysis of how people will expect to query
the data and then work backwards to how to store and index the data to
achieve the desired queries.

Note that the standard tokenizer will tokenize all of the elements of a
path or IP as separate terms. Ditto for a query, so you can effectively do
bth keyword and phrase queries to match individual terms (e.g., path
elements) or phrases or sequences of path elements or IP address components.

-- Jack Krupansky

On Thu, Feb 25, 2016 at 12:41 AM, Anil  wrote:

> Sorry Jack for confusion.
>
> I have field which holds free text. text can contain path , ip or any free
> text.
>
> I would like to tokenize the text of the field using white space. if the
> text token is of path or ip pattern , it has be tockenized like path
> hierarchy way.
>
>
> Regards,
> Anil
>
> On 24 February 2016 at 21:59, Jack Krupansky 
> wrote:
>
> > Your statement makes no sense. Please clarify. Express your
> requirement(s)
> > in plain English first before dragging in possible solutions.
> Technically,
> > path elements can have embedded spaces.
> >
> > -- Jack Krupansky
> >
> > On Wed, Feb 24, 2016 at 6:53 AM, Anil  wrote:
> >
> > > HI,
> > >
> > > i need to use both WhitespaceTokenizerFactory and
> > > PathHierarchyTokenizerFactory for use case.
> > >
> > > Solr supports only one tokenizer. is there any way we can achieve
> > > PathHierarchyTokenizerFactory  functionality with filters ?
> > >
> > > Please advice.
> > >
> > > Regards,
> > > Anil
> > >
> >
>


multiple sources or mysql imports

2016-02-25 Thread John Blythe
hi all,

i'm currently populating my documents via a mysql query. it occurred to me
that i have another source of similar data that would be helpful to use
that resides in the same database, but in another table. the two tables
share nothing relationally so there's no joining that can occur that i can
think of, but both of them contain data i'd like to have located in the
index.

what is the best way to handle a multi-source or multi-step import like
that, particularly when both are in mysql?

thanks-


Re: Query time de-boost

2016-02-25 Thread Emir Arnautovic

Hi Shamik,
You are righ boosting with values that are lower than 1 is still 
positive, but you can boost with negative value and that should do the 
trick so you can do bq=ContenGroup-local:Developer^-99 (note that it can 
result in negative score).
If you need more than just Developer/Others you can also introduce 
additional field that can be used for boosting. Also, you can use 
dismax/edismax bf to get more control.


Regards,
Emir

On 24.02.2016 17:27, shamik wrote:

Hi Emir,

 I've a bunch of contentgroup values, so boosting them individually is
cumbersome. I've boosting on query fields

qf=text^6 title^15 IndexTerm^8

and

bq=Source:simplecontent^10 Source:Help^20
(-ContentGroup-local:("Developer"))^99

I was hoping *(-ContentGroup-local:("Developer"))^99* will implicitly boost
the rest, but that didn't happen.

I'm using edismax.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-time-de-boost-tp4259309p4259551.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: What search metrics are useful?

2016-02-25 Thread Charlie Hull

On 24/02/2016 16:20, Walter Underwood wrote:

Click through rate (CTR) is fundamental. That is easy to understand
and integrates well with other business metrics like conversion. CTR
is at least one click anywhere in the result set (first page, second
page, …). Count multiple clicks as a single success. The metric is,
“at least one click”.


I saw an interesting talk last year by Susan Dumais of Microsoft 
Research which contained the surprising (to me) statistic that clicks 
only predict relevance 45% of the time: here's the talk and the original 
paper:

www.iskouk.org/sites/default/files/Dumais_Slides2015-11-06.PDF
http://research.microsoft.com/en-us/um/people/sdumais/TOIS-p147-fox.pdf

One also shouldn't forget manual testing of relevance, something that we 
and Open Source Connections are working on a lot at present. Content 
owners / business types are far better at judging relevance than 
developers who may not understand the rationale. I talked about this at 
the British Computer Society last year:

http://www.flax.co.uk/blog/2015/11/27/search-solutions-2015-towards-new-model-search-relevance-testing/
and here's the tool OSC has developed that we're using: www.quepid.com

Cheers

Charlie



No hit rate is sort of useful, but you need to know which queries are
getting no hits, so you can fix it.

For latency metrics, look at 90th percentile or 95th percentile.
Average is useless because response time is a one-sided distribution,
so it will be thrown off by outliers. Percentiles have a direct
customer satisfaction interpretation. 90% of searches were under one
second, for example. Median response time should be very, very fast
because of caching in Solr. During busy periods, our median response
time is about 1.5 ms.

Number of different queries per conversion is a good way to look how
query assistance is working. Things like autosuggest, fuzzy, etc.

About 10% of queries will be misspelled, so you do need to deal with
that.

Finding underperforming queries is trickier. I really need to write
an article on that.





“Search Analytics for Your Site” by Lou Rosenfeld is a good
introduction.

http://rosenfeldmedia.com/books/search-analytics-for-your-site/


Sea Urchin is doing some good work in search metrics:
https://seaurchin.io/ 

wunder Walter Underwood wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog) Search Guy, Chegg


On Feb 24, 2016, at 2:38 AM, Emir Arnautovic
 wrote:

Hi Bill, You can take a look at Sematext's search analytics
(https://sematext.com/search-analytics). It provides some of
metrics you mentioned, plus some additional (top queries, CTR,
click stats, paging stats etc.). In combination with Sematext's
performance metrics (https://sematext.com/spm) you can have full
picture of your search infrastructure.

Regards, Emir

-- Monitoring * Alerting * Anomaly Detection * Centralized Log
Management Solr & Elasticsearch Support * http://sematext.com/


On 24.02.2016 04:07, William Bell wrote:

How do others look at search metrics?

1. Search conversion? Do you look at searches and if the user
does not click on a result, and reruns the search that would be a
failure?

2. How to measure auto complete success metrics?

3. Facets/filters could be considered negative, since we did not
find the results that the user wanted, and now they are filtering
- who to measure?

4. One easy metric is searches with 0 results. We could auto
expand the geo distance or ask the user "did you mean" ?

5. Another easy one would be tech performance: "time it takes in
seconds to get a result".

6. How to measure fuzzy? How do you know you need more synonyms?
How to measure?

7. How many searches it takes before the user clicks on a
result?

Other ideas? Is there a video or presentation on search metrics
that would be useful?









--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk