date:20151215

How are you trying to display the results? Highlighting is a bit of an
odd beast. Assuming it's correctly configured, the response packet
will have a separate highlight section, it's the application's
responsibility to present that pleasingly.

What _do_ you get bak in the response?

BTW, the mail sever pretty aggressively strips attachments, your's
didn't come through.

Best,
Erick

On Tue, Dec 15, 2015 at 3:25 AM, Evert R.  wrote:
> Hi there!
>
> It´s my first installation, not sure if here is the right channel...
>
> Here is my steps:
>
> 1. Set up a basic install of solr 5.4.0
>
> 2. Create a new core through command line (bin/solr create -c test)
>
> 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>
> 4. Query over the browser and it brings the correct search, but it does not
> show the part of the text I am querying, the highlight.
>
>   I have already flagled the 'hl' option. But still it does not word...
>
> Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
> matches for this word, it shows me the book name (pdf file) but does not
> bring which part of the text it has the word peace on it.
>
>
> I am problably missing some configuration in schema.xml, which is missing
> from my folder /solr/server/solr/test/conf/
>
> Or even the solrconfig.xml...
>
> I have read a bunch of things about highlight check these files, copied the
> standard schema.xml to my core/conf folder, but still it does not bring the
> highlight.
>
>
> Attached a copy of my solrconfig.xml file.
>
>
> I am very sorry for this, probably, dumb and too basic question... First
> time I see solr in live.
>
>
> Any help will be appreciated.
>
>
>
> Best regards,
>
>
> Evert Ramos
>
> evert.ra...@gmail.com
>

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

2015-12-15 Thread Charlie Hull

On 15/12/2015 14:13, Vikram Parmar wrote:

Hi Mikhail,

Hi,

In case you're interested, several years ago we prototyped a Lucene
codec using Redis for just this sort of application:

http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/

It's a slightly crazy idea but appeared to work :)

Charlie

Thanks for chiming in. Looking forward to your post regarding updatable
numeric DocValues.

What would be the 2nd most promising approach for now, would you say EFF
should be ok to go with?

Updating and reloading the EFF external file (containing a millions lines)
at very short intervals is fine? Say every 10 seconds?

Thanks!

On Tue, Dec 15, 2015 at 5:46 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

I believe https://issues.apache.org/jira/browse/SOLR-5944 is the most
promising approach for such scenarios.
Despite it's not delivered in distro.
We are going to publish a post about it at blog.griddynamics.com.

FWIW, I suppose EFF can be returned in result list.

On Fri, Dec 11, 2015 at 1:48 PM, Vikram Parmar
wrote:

We are creating a web application which would contain posts (something

FB or say Youtube). For the stable part of the data (i.e.the facets,

results & its content), we plan to use SOLR.

What should we use for the unstable part of the data (i.e. dynamic and
volatile content such as Like counts, Comments counts, Viewcounts)?

Option 1) Redis

What about storing the "dynamic" data in a different data store (like
Redis)? Thus, everytime the counts get refreshed, I do not have to

reindex

the data into SOLR at all. Thus SOLR indexing is only triggered when new
posts are added to the site, and never on any activity on the posts by

the

users.

Side-note :-
I also looked at the SOLR-Redis plugin at
https://github.com/sematext/solr-redis

The plugin looks good, but not sure if the plugin can be used to fetch

the

data stored in Redis as part of the solr result set, i.e. in docs. The
description looks more like the Redis data can be used in the function
queries for boosting, sorting, etc. Anyone has experience with this?

Option 2) SOLR NRT with Soft Commits

We would depend on the in-built NRT features. Let's say we do

soft-commits

every second and hard-commits every 10 seconds. Suppose huge amount of
dynamic data is created on the site across hundreds of posts, e.g. 10
likes across 1 posts. Thus, this would mean soft-commiting on 1
rows every second. And then hard-commiting those many rows every 10
seconds. Isn't this overkill?

Which option is preferred? How would you compare both options in terms of
scalibility, maintenance, feasibility, best-practices, etc? Any real-life
experiences or links to articles?

Many thanks!

p.s. EFF (external file fields) is not an option, as I read that the data
in that file can only be used in function queries and cannot be returned

part of a document.

--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.flax.co.uk

Re: similarity as a parameter

You would need to define an alternate field which copied a base field but
then had the desired alternate similarity, using SchemaSimilarityFactory.

See:
https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements

-- Jack Krupansky

On Tue, Dec 15, 2015 at 10:02 AM, Dmitry Kan  wrote:

> Hi guys,
>
> Is there a way to alter the similarity class at runtime, with a parameter?
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>

Re: solrcloud used a lot of memory and memory keep increasing during long time run

2015-12-15 Thread zhenglingyun

Thank you very much.
I will try reduce the heap memory and check if the memory still keep increasing 
or not.

> 在 2015年12月15日，19:37，Rahul Ramesh  写道：
> 
> You should actually decrease solr heap size. Let me explain a bit.
> 
> Solr requires very less heap memory for its operation and more memory for
> storing data in main memory. This is because solr uses mmap for storing the
> index files.
> Please check the link
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
> understanding how solr operates on files .
> 
> Solr has typical problem of Garbage collection once you the heap size to a
> large value. It will have indeterminate pauses due to GC. The amount of
> heap memory required is difficult to tell. However the way we tuned this
> parameter is setting it to a low value and increasing it by 1Gb whenever
> OOM is thrown.
> 
> Please check the problem of having large Java Heap
> 
> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
> 
> 
> Just for your reference, in our production setup, we have data of around
> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
> the rest of the memory we will leave it to OS to manage. We do around 1000
> (search + Insert)/second on the data.
> 
> I hope this helps.
> 
> Regards,
> Rahul
> 
> 
> 
> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun  wrote:
> 
>> Hi, list
>> 
>> I’m new to solr. Recently I encounter a “memory leak” problem with
>> solrcloud.
>> 
>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>> have
>> one collection with about 400k docs. The index size of the collection is
>> about
>> 500MB. Memory for solr is 16GB.
>> 
>> Following is "ps aux | grep solr” :
>> 
>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>> -Xloggc:/var/log/solr/gc.log
>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>> -Dsolr.authentication.simple.anonymous.allowed=true
>> -Dsolr.security.proxyuser.hue.hosts=*
>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>> -Dsolr.max.connector.thread=1 -Dsolr.solr.home=/var/lib/solr
>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>> -Xloggc:/var/log/solr/gc.log
>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>> -Dsolr.authentication.simple.anonymous.allowed=true
>> -Dsolr.security.proxyuser.hue.hosts=*
>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>> -Dsolr.max.connector.thread=1 -Dsolr.solr.home=/var/lib/solr
>> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
>> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
>> -Dcatalina.base=/var/lib/solr/tomcat-deployment
>>

Re: Solr Basic Configuration - Highlight - Begginer

2015-12-15 Thread Evert R.

Hi Erick,

Thank you very much for the reply!!

I do get back the full text, autor, and a whole lots of stuff which doesn´t
really matter for my project.

So, what you are saying is that the solr gets me back the full content and
my application will fix the rest? Which means for me that all my books (pdf
files) when searching for an specific word it will bring me the whole book
content that has the requested query. And my application (php) in this
case... will take care of show only part of the text (such as in highlight,
as I was understandind) and hightlight the key word I was looking for?

If so, Erick, you gave me a big help clearing out... I thought I would do
that with Solr in an easy way. =)

Thanks for the attachements tip!

Best regards,

Evert

2015-12-15 14:56 GMT-02:00 Erick Erickson :

> How are you trying to display the results? Highlighting is a bit of an
> odd beast. Assuming it's correctly configured, the response packet
> will have a separate highlight section, it's the application's
> responsibility to present that pleasingly.
>
> What _do_ you get bak in the response?
>
> BTW, the mail sever pretty aggressively strips attachments, your's
> didn't come through.
>
> Best,
> Erick
>
> On Tue, Dec 15, 2015 at 3:25 AM, Evert R.  wrote:
> > Hi there!
> >
> > It´s my first installation, not sure if here is the right channel...
> >
> > Here is my steps:
> >
> > 1. Set up a basic install of solr 5.4.0
> >
> > 2. Create a new core through command line (bin/solr create -c test)
> >
> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
> >
> > 4. Query over the browser and it brings the correct search, but it does
> not
> > show the part of the text I am querying, the highlight.
> >
> >   I have already flagled the 'hl' option. But still it does not word...
> >
> > Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
> > matches for this word, it shows me the book name (pdf file) but does not
> > bring which part of the text it has the word peace on it.
> >
> >
> > I am problably missing some configuration in schema.xml, which is missing
> > from my folder /solr/server/solr/test/conf/
> >
> > Or even the solrconfig.xml...
> >
> > I have read a bunch of things about highlight check these files, copied
> the
> > standard schema.xml to my core/conf folder, but still it does not bring
> the
> > highlight.
> >
> >
> > Attached a copy of my solrconfig.xml file.
> >
> >
> > I am very sorry for this, probably, dumb and too basic question... First
> > time I see solr in live.
> >
> >
> > Any help will be appreciated.
> >
> >
> >
> > Best regards,
> >
> >
> > Evert Ramos
> >
> > evert.ra...@gmail.com
> >
>

Re: similarity as a parameter

2015-12-15 Thread Chris Hostetter


: I think this is a legitimate request. Majority of the similarities are 
: compatible index wise. I think the only exception is sweet spot 
: similarity.

I think you are grossly underestimating the risk of arbitrarily using diff 
Similarities between index time and query time -- particulaly in how norms 
are computed.  But for the sake of argument let's assuming you know what 
you are doing, you are careful, and you know that the index time 
similarity you used is compatibly with N diff query time similarities you 
want to choose between...

: I wonder what solr-plugin would be best for this functionality. 
: How about a custom search component, in its prepare method?
: 
: I think we can access (Solr)IndexSearcher inside a SearchComponent.
: setSimilarity in the process method should work.

this owuld be *very* dangerous to do, because the SolrIndexSearcher is 
shared across all requests -- so you'd get race conditions and 
non-deterministic behavior from diff queries not getting the similarity 
they expected.

The only sane way to do this on a per-request basis would either be...

1) wrap the IndexSearcher in a new IndexSearcherWrapper that returned a 
per-request Similarity.

2) modify the Query objects themselves so that createWeight use the 
similarity you want instead of delegating to INdexSearcher.getSimilarity 
(see for example how BooleanQuery/BooleanWeight use the "disableCoord" 
property of the query to decide wether or not to care about Similarity.coord


Depending on your real world usecase / goal, i would suspect that either 
way a QParser that wraps the constructed query is going to be the 
simplest/cleanest solution regardless of wether #1 or #2 makes the most 
sense -- perhaps even achieving #2 by using #1 so that createWeight in 
your new QueryWrapper class does the IndexSearcher wrapping before 
delegating.




-Hoss
http://www.lucidworks.com/

Re: solrcloud used a lot of memory and memory keep increasing during long time run

Rahul's comments were spot on. You can gain more confidence that this
is normal if if you try attaching a memory reporting program (jconsole
is one) you'll see the memory grow for quite a while, then garbage
collection kicks in and you'll see it drop in a sawtooth pattern.

Best,
Erick

On Tue, Dec 15, 2015 at 8:19 AM, zhenglingyun  wrote:
> Thank you very much.
> I will try reduce the heap memory and check if the memory still keep 
> increasing or not.
>
>> 在 2015年12月15日，19:37，Rahul Ramesh  写道：
>>
>> You should actually decrease solr heap size. Let me explain a bit.
>>
>> Solr requires very less heap memory for its operation and more memory for
>> storing data in main memory. This is because solr uses mmap for storing the
>> index files.
>> Please check the link
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
>> understanding how solr operates on files .
>>
>> Solr has typical problem of Garbage collection once you the heap size to a
>> large value. It will have indeterminate pauses due to GC. The amount of
>> heap memory required is difficult to tell. However the way we tuned this
>> parameter is setting it to a low value and increasing it by 1Gb whenever
>> OOM is thrown.
>>
>> Please check the problem of having large Java Heap
>>
>> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>>
>>
>> Just for your reference, in our production setup, we have data of around
>> 60Gb/node spread across 25 collections. We have configured 8GB as heap and
>> the rest of the memory we will leave it to OS to manage. We do around 1000
>> (search + Insert)/second on the data.
>>
>> I hope this helps.
>>
>> Regards,
>> Rahul
>>
>>
>>
>> On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun  wrote:
>>
>>> Hi, list
>>>
>>> I’m new to solr. Recently I encounter a “memory leak” problem with
>>> solrcloud.
>>>
>>> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
>>> have
>>> one collection with about 400k docs. The index size of the collection is
>>> about
>>> 500MB. Memory for solr is 16GB.
>>>
>>> Following is "ps aux | grep solr” :
>>>
>>> /usr/java/jdk1.7.0_67-cloudera/bin/java
>>> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
>>> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>> -Xloggc:/var/log/solr/gc.log
>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>> -Dsolr.security.proxyuser.hue.hosts=*
>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
>>> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
>>> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
>>> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
>>> -Dsolr.max.connector.thread=1 -Dsolr.solr.home=/var/lib/solr
>>> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
>>> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
>>> -Dsolr.hdfs.blockcache.blocksperbank=16384
>>> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
>>> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
>>> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
>>> -Xloggc:/var/log/solr/gc.log
>>> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
>>> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
>>> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
>>> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
>>> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
>>> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
>>> -Dsolr.authentication.simple.anonymous.allowed=true
>>> -Dsolr.security.proxyuser.hue.hosts=*
>>> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
>>> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=

Re: similarity as a parameter

2015-12-15 Thread Ahmet Arslan

Hi Dmitry,

I think this is a legitimate request. Majority of the similarities are 
compatible index wise. I think the only exception is sweet spot similarity.

In Lucene, it can be changed on the fly with a new Searcher. It should be 
possible to do so in solr.

Thanks,
Ahmet

On Tuesday, December 15, 2015 6:08 PM, Jack Krupansky 
 wrote:
You would need to define an alternate field which copied a base field but
then had the desired alternate similarity, using SchemaSimilarityFactory.

See:
https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements

-- Jack Krupansky

On Tue, Dec 15, 2015 at 10:02 AM, Dmitry Kan  wrote:

> Hi guys,
>
> Is there a way to alter the similarity class at runtime, with a parameter?
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>

Re: Collection API migrate statement

You might look at colleciton aliasing, this is sometimes used for
time-series data (which I'm guessing this is).

But I have to ask whether migrating tuf faround like that is really
necessary. 2M docs isn't very many, have you stress tested with just
indexing them all to a single collection? Is the traffic on the Hot
part really heavy enough to warrant the complexity?

Best,
Erick

On Tue, Dec 15, 2015 at 7:57 AM, philippa griggs
 wrote:
> Hello,
>
>
> Solr 5.2.1.
>
>
> I'm using the collection API migrate statement in our test environment with 
> the view to implement a Hot, Cold arrangement- newer documents will be kept 
> on the Hot collection and each night the oldest documents will be migrated 
> into the Cold collection. I've got it all working with a small amount of 
> documents (around 28,000).
>
>
> I'm now trying to migrate around 200,000 documents and am getting 'migrate 
> the collection time out:180s'  message back.
>
>
> The logs from the source collection are:
>
>
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
> org.apache.solr.cloud.OverseerCollectionProcessor; Successfully created 
> replica of temp source collection on target leader node
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
> org.apache.solr.cloud.OverseerCollectionProcessor; Requesting merge of temp 
> source collection replica to target leader
> INFO  - 2015-12-15 14:45:36.648; [   ] 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeDeleted fired on 
> path /overseer/collection-queue-work/qnr-04 state SyncConnected
> INFO  - 2015-12-15 14:45:36.651; [   ] 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged 
> fired on path /overseer/collection-queue-work state SyncConnected
> ERROR - 2015-12-15 14:45:36.651; [   ] org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: migrate the collection time out:180s
> at 
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)
> etc
>
>
> The logs from the target collection are:
>
> INFO  - 2015-12-15 14:43:19.128; [split_shard1_temp_shard2 shard1  
> split_shard1_temp_shard2_shard1_replica2] org.apache.solr.update.UpdateLog; 
> Took 22 ms to seed version buckets with highest version 1520634636692094979
> INFO  - 2015-12-15 14:43:19.129; [split_shard1_temp_shard2 shard1  
> split_shard1_temp_shard2_shard1_replica2] 
> org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. 
> core=split_shard1_temp_shard2_shard1_replica2
> INFO  - 2015-12-15 14:43:19.199; [   ] 
> org.apache.solr.update.DirectUpdateHandler2; start mergeIndexes{}
>
> As there are no errors in the target collection, am I right in assuming the 
> timeout occured because the merge took too long? If that is so, how to I 
> increase the timeout period? Ideally I will need to migrate around 2 million 
> documents a night.
>
>
> Any help would be much appreciated.
>
>
> Philippa
>
>

RE: similarity as a parameter

2015-12-15 Thread Markus Jelsma

Hello Dmitry - this is currently not possible. Quickest way is to reconfigure 
and reload the cores. Some similarities also require you to reindex, so it is a 
bad idea anyway.
Markus 
 
-Original message-
> From:Dmitry Kan 
> Sent: Tuesday 15th December 2015 16:02
> To: solr-user@lucene.apache.org
> Subject: similarity as a parameter
> 
> Hi guys,
> 
> Is there a way to alter the similarity class at runtime, with a parameter?
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>

Re: similarity as a parameter

2015-12-15 Thread Dmitry Kan

Markus, Jack,

I think Ahmet nails it pretty nicely: the similarity functions in question
are compatible on the index level. So it is not necessary to create a
separate search field.

Ahmet, I like your idea. Will take a look, thanks.

Rgds,
Dmitry

On Tue, Dec 15, 2015 at 7:58 PM, Ahmet Arslan 
wrote:

>
>
> I wonder what solr-plugin would be best for this functionality.
> How about a custom search component, in its prepare method?
>
> I think we can access (Solr)IndexSearcher inside a SearchComponent.
> setSimilarity in the process method should work.
>
> Ahmet
>
>
> On Tuesday, December 15, 2015 7:43 PM, Ahmet Arslan
>  wrote:
> Hi Dmitry,
>
> I think this is a legitimate request. Majority of the similarities are
> compatible index wise. I think the only exception is sweet spot similarity.
>
> In Lucene, it can be changed on the fly with a new Searcher. It should be
> possible to do so in solr.
>
> Thanks,
> Ahmet
>
>
>
>
> On Tuesday, December 15, 2015 6:08 PM, Jack Krupansky <
> jack.krupan...@gmail.com> wrote:
> You would need to define an alternate field which copied a base field but
> then had the desired alternate similarity, using SchemaSimilarityFactory.
>
> See:
> https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements
>
>
> -- Jack Krupansky
>
>
> On Tue, Dec 15, 2015 at 10:02 AM, Dmitry Kan  wrote:
>
> > Hi guys,
> >
> > Is there a way to alter the similarity class at runtime, with a
> parameter?
> >
> > --
> > Dmitry Kan
> > Luke Toolbox: http://github.com/DmitryKey/luke
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> > SemanticAnalyzer: www.semanticanalyzer.info
> >
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

RE: similarity as a parameter

2015-12-15 Thread Chris Hostetter


: Sweetspot does require reindexing but is that the only one? I have not 
: investigated some exotic implementations, anyone to confirm sweetspot is 
: the only one? In that case you could patch QueryComponent right, instead 
: of having a custom component?

I'm not sure how where this thread developed this weird assumption that 
switching from/to SweetSpotSimilarity in particular requires reindexing 
but that many/most other Similarities wouldn't require this ...  
SweetSpotSimilarity certainly has explicit config options for tuning the 
index time field norm, but it's not a special case...

1) Solr shouldn't make any naive assumptions about whatever 
arbitrary (custom) Similarity class a user might provide -- particularly 
when it comes to field norms, since the all of the Similarity base classes 
/ callers have been setup to make it trivial for people to write custom 
ismilarities for the express purpose of adjusting how many bits are used 
by field norms.

2) In both ClassicSimilarity and BM25Similarity (the new default in Solr6) 
the config option "discountOverlaps" impacts what norm values get encoded 
at index time for a given field length -- so it's possible to break things 
w/o even switching what class you use, w/o even consider custom Similarity 
impls (or new out of the box similarity classes that might be added to 
Solr tomorow)


-Hoss
http://www.lucidworks.com/

Re: similarity as a parameter

2015-12-15 Thread Dmitry Kan

Hi Hoss,

Thanks for sharing the knowledge on dangerous zones, will try to avoid
them. #2 is quite probable way of implementing this in my case, as many
Query objects are custom (although not all). But #1 is compelling too and
sounds like a bit less trouble.

On Tue, Dec 15, 2015 at 8:13 PM, Chris Hostetter 
wrote:

>
> : I think this is a legitimate request. Majority of the similarities are
> : compatible index wise. I think the only exception is sweet spot
> : similarity.
>
> I think you are grossly underestimating the risk of arbitrarily using diff
> Similarities between index time and query time -- particulaly in how norms
> are computed.  But for the sake of argument let's assuming you know what
> you are doing, you are careful, and you know that the index time
> similarity you used is compatibly with N diff query time similarities you
> want to choose between...
>
> : I wonder what solr-plugin would be best for this functionality.
> : How about a custom search component, in its prepare method?
> :
> : I think we can access (Solr)IndexSearcher inside a SearchComponent.
> : setSimilarity in the process method should work.
>
> this owuld be *very* dangerous to do, because the SolrIndexSearcher is
> shared across all requests -- so you'd get race conditions and
> non-deterministic behavior from diff queries not getting the similarity
> they expected.
>
> The only sane way to do this on a per-request basis would either be...
>
> 1) wrap the IndexSearcher in a new IndexSearcherWrapper that returned a
> per-request Similarity.
>
> 2) modify the Query objects themselves so that createWeight use the
> similarity you want instead of delegating to INdexSearcher.getSimilarity
> (see for example how BooleanQuery/BooleanWeight use the "disableCoord"
> property of the query to decide wether or not to care about
> Similarity.coord
>
>
> Depending on your real world usecase / goal, i would suspect that either
> way a QParser that wraps the constructed query is going to be the
> simplest/cleanest solution regardless of wether #1 or #2 makes the most
> sense -- perhaps even achieving #2 by using #1 so that createWeight in
> your new QueryWrapper class does the IndexSearcher wrapping before
> delegating.
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>



-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

RE: similarity as a parameter

2015-12-15 Thread Markus Jelsma

Sweetspot does require reindexing but is that the only one? I have not 
investigated some exotic implementations, anyone to confirm sweetspot is the 
only one? In that case you could patch QueryComponent right, instead of having 
a custom component?

M.
 
 
-Original message-
> From:Dmitry Kan 
> Sent: Tuesday 15th December 2015 19:07
> To: solr-user@lucene.apache.org; Ahmet Arslan 
> Subject: Re: similarity as a parameter
> 
> Markus, Jack,
> 
> I think Ahmet nails it pretty nicely: the similarity functions in question
> are compatible on the index level. So it is not necessary to create a
> separate search field.
> 
> Ahmet, I like your idea. Will take a look, thanks.
> 
> Rgds,
> Dmitry
> 
> On Tue, Dec 15, 2015 at 7:58 PM, Ahmet Arslan 
> wrote:
> 
> >
> >
> > I wonder what solr-plugin would be best for this functionality.
> > How about a custom search component, in its prepare method?
> >
> > I think we can access (Solr)IndexSearcher inside a SearchComponent.
> > setSimilarity in the process method should work.
> >
> > Ahmet
> >
> >
> > On Tuesday, December 15, 2015 7:43 PM, Ahmet Arslan
> >  wrote:
> > Hi Dmitry,
> >
> > I think this is a legitimate request. Majority of the similarities are
> > compatible index wise. I think the only exception is sweet spot similarity.
> >
> > In Lucene, it can be changed on the fly with a new Searcher. It should be
> > possible to do so in solr.
> >
> > Thanks,
> > Ahmet
> >
> >
> >
> >
> > On Tuesday, December 15, 2015 6:08 PM, Jack Krupansky <
> > jack.krupan...@gmail.com> wrote:
> > You would need to define an alternate field which copied a base field but
> > then had the desired alternate similarity, using SchemaSimilarityFactory.
> >
> > See:
> > https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements
> >
> >
> > -- Jack Krupansky
> >
> >
> > On Tue, Dec 15, 2015 at 10:02 AM, Dmitry Kan  wrote:
> >
> > > Hi guys,
> > >
> > > Is there a way to alter the similarity class at runtime, with a
> > parameter?
> > >
> > > --
> > > Dmitry Kan
> > > Luke Toolbox: http://github.com/DmitryKey/luke
> > > Blog: http://dmitrykan.blogspot.com
> > > Twitter: http://twitter.com/dmitrykan
> > > SemanticAnalyzer: www.semanticanalyzer.info
> > >
> >
> 
> 
> 
> -- 
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>

Re: Solr Basic Configuration - Highlight - Begginer

No, that's not what I meant. The highlight component adds a special
section to the return packet that will contain "snippets" of text with
highlights. You control how big those snippets are via various
parameters in the highlight component and they'll have the tags you
specify for highlighting.

Your app needs to pull the information from the highlight portion of
the response packet rather than the document list. Just execute your
queries via cURL or a browser to see the structure of a response to
see what I mean.

And note that you do _not_ need to return the fields you're
highlighting in the "fl" list so you do _not_ need to return the
entire document contents.

What are you using to display the results anyway?

Best,
Erick

On Tue, Dec 15, 2015 at 10:02 AM, Evert R.  wrote:
> Hi Erick,
>
> Thank you very much for the reply!!
>
> I do get back the full text, autor, and a whole lots of stuff which doesn´t
> really matter for my project.
>
> So, what you are saying is that the solr gets me back the full content and
> my application will fix the rest? Which means for me that all my books (pdf
> files) when searching for an specific word it will bring me the whole book
> content that has the requested query. And my application (php) in this
> case... will take care of show only part of the text (such as in highlight,
> as I was understandind) and hightlight the key word I was looking for?
>
> If so, Erick, you gave me a big help clearing out... I thought I would do
> that with Solr in an easy way. =)
>
> Thanks for the attachements tip!
>
> Best regards,
>
> Evert
>
> 2015-12-15 14:56 GMT-02:00 Erick Erickson :
>
>> How are you trying to display the results? Highlighting is a bit of an
>> odd beast. Assuming it's correctly configured, the response packet
>> will have a separate highlight section, it's the application's
>> responsibility to present that pleasingly.
>>
>> What _do_ you get bak in the response?
>>
>> BTW, the mail sever pretty aggressively strips attachments, your's
>> didn't come through.
>>
>> Best,
>> Erick
>>
>> On Tue, Dec 15, 2015 at 3:25 AM, Evert R.  wrote:
>> > Hi there!
>> >
>> > It´s my first installation, not sure if here is the right channel...
>> >
>> > Here is my steps:
>> >
>> > 1. Set up a basic install of solr 5.4.0
>> >
>> > 2. Create a new core through command line (bin/solr create -c test)
>> >
>> > 3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)
>> >
>> > 4. Query over the browser and it brings the correct search, but it does
>> not
>> > show the part of the text I am querying, the highlight.
>> >
>> >   I have already flagled the 'hl' option. But still it does not word...
>> >
>> > Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
>> > matches for this word, it shows me the book name (pdf file) but does not
>> > bring which part of the text it has the word peace on it.
>> >
>> >
>> > I am problably missing some configuration in schema.xml, which is missing
>> > from my folder /solr/server/solr/test/conf/
>> >
>> > Or even the solrconfig.xml...
>> >
>> > I have read a bunch of things about highlight check these files, copied
>> the
>> > standard schema.xml to my core/conf folder, but still it does not bring
>> the
>> > highlight.
>> >
>> >
>> > Attached a copy of my solrconfig.xml file.
>> >
>> >
>> > I am very sorry for this, probably, dumb and too basic question... First
>> > time I see solr in live.
>> >
>> >
>> > Any help will be appreciated.
>> >
>> >
>> >
>> > Best regards,
>> >
>> >
>> > Evert Ramos
>> >
>> > evert.ra...@gmail.com
>> >
>>

Re: similarity as a parameter

2015-12-15 Thread Ahmet Arslan

I wonder what solr-plugin would be best for this functionality. 
How about a custom search component, in its prepare method?

I think we can access (Solr)IndexSearcher inside a SearchComponent.
setSimilarity in the process method should work.

Ahmet

On Tuesday, December 15, 2015 7:43 PM, Ahmet Arslan  
wrote:
Hi Dmitry,

I think this is a legitimate request. Majority of the similarities are 
compatible index wise. I think the only exception is sweet spot similarity.

In Lucene, it can be changed on the fly with a new Searcher. It should be 
possible to do so in solr.

Thanks,
Ahmet

On Tuesday, December 15, 2015 6:08 PM, Jack Krupansky 
 wrote:
You would need to define an alternate field which copied a base field but
then had the desired alternate similarity, using SchemaSimilarityFactory.

See:
https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements

-- Jack Krupansky

On Tue, Dec 15, 2015 at 10:02 AM, Dmitry Kan  wrote:

> Hi guys,
>
> Is there a way to alter the similarity class at runtime, with a parameter?
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: www.semanticanalyzer.info
>

Re: Collection API migrate statement

2015-12-15 Thread Shalin Shekhar Mangar

The migrate is a long running operation. Please use it along with
async= parameter so that it can execute in
the background. Then you can use the request status API to poll and
wait until the operation completes. If there is any error then the
same request status API will return the response. See
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RequestStatus

On Tue, Dec 15, 2015 at 9:27 PM, philippa griggs
 wrote:
> Hello,
>
>
> Solr 5.2.1.
>
>
> I'm using the collection API migrate statement in our test environment with 
> the view to implement a Hot, Cold arrangement- newer documents will be kept 
> on the Hot collection and each night the oldest documents will be migrated 
> into the Cold collection. I've got it all working with a small amount of 
> documents (around 28,000).
>
>
> I'm now trying to migrate around 200,000 documents and am getting 'migrate 
> the collection time out:180s'  message back.
>
>
> The logs from the source collection are:
>
>
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
> org.apache.solr.cloud.OverseerCollectionProcessor; Successfully created 
> replica of temp source collection on target leader node
> INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
> org.apache.solr.cloud.OverseerCollectionProcessor; Requesting merge of temp 
> source collection replica to target leader
> INFO  - 2015-12-15 14:45:36.648; [   ] 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeDeleted fired on 
> path /overseer/collection-queue-work/qnr-04 state SyncConnected
> INFO  - 2015-12-15 14:45:36.651; [   ] 
> org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged 
> fired on path /overseer/collection-queue-work state SyncConnected
> ERROR - 2015-12-15 14:45:36.651; [   ] org.apache.solr.common.SolrException; 
> org.apache.solr.common.SolrException: migrate the collection time out:180s
> at 
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)
> etc
>
>
> The logs from the target collection are:
>
> INFO  - 2015-12-15 14:43:19.128; [split_shard1_temp_shard2 shard1  
> split_shard1_temp_shard2_shard1_replica2] org.apache.solr.update.UpdateLog; 
> Took 22 ms to seed version buckets with highest version 1520634636692094979
> INFO  - 2015-12-15 14:43:19.129; [split_shard1_temp_shard2 shard1  
> split_shard1_temp_shard2_shard1_replica2] 
> org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. 
> core=split_shard1_temp_shard2_shard1_replica2
> INFO  - 2015-12-15 14:43:19.199; [   ] 
> org.apache.solr.update.DirectUpdateHandler2; start mergeIndexes{}
>
> As there are no errors in the target collection, am I right in assuming the 
> timeout occured because the merge took too long? If that is so, how to I 
> increase the timeout period? Ideally I will need to migrate around 2 million 
> documents a night.
>
>
> Any help would be much appreciated.
>
>
> Philippa
>
>



-- 
Regards,
Shalin Shekhar Mangar.

Issue in Geospatial Search

2015-12-15 Thread Shenbagarajan

Hello,

I am trying to implement geo spatial search in solr by referring the below
site.
https://cwiki.apache.org/confluence/display/solr/Spatial+Search

Everytime i try to execute i am getting the same error as below.
 "msg":"The field latlon does not support spatial filtering",

When i try to run the same query in Solr 4.10 its working fine without any
issues. But in Solr 5.3 its not working properly. Direct me to figure out
where the issue is as i am stuck with it:(

Thanks,
Shen.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-in-Geospatial-Search-tp4245441.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is DIH going to be removed from Solr future versions?

I doubt DIH will be "removed". It more likely will be relegated - still
there, but emphasised less.

Another possibility that has been mooted is to extract it, so that it
can run outside of Solr. This strikes me as the best option. Having it
run inside Solr strikes me as architecturally wrong, and also
problematic in a SolrCloud world. Taking the DIH codebase and running it
*outside* Solr you get the best of DIH without the same set of issues.

Upayavira

On Tue, Dec 15, 2015, at 05:47 AM, Anil Cherian wrote:
> Dear Team,
> 
> I use DIH extensively and even wrote my own custom transformers in some
> situations.
> Recently during an architecture discussion one of my team members told
> that
> Solr is going to take away DIH from its future versions.
> 
> Is that true?
> 
> Also is using DIH for say 2 or 3 million docs a good option for indexing
> an
> XML content data set. I am planning to use it either by calling separate
> entities parallely or multiple /dataimport in solrconfig.xml.
> 
> Cld you please reply at your earliest convenience as it is an important
> decision for us to continue on DIH or not!
> 
> Thanks and Rgds,
> Anil.

Re: Memory leak in SolrCloud 4.6

2015-12-15 Thread Emir Arnautovic


Hi Mark,
Can you tell us bit more about your index and load. Why do you thing 
there is a leak? If you give that memory to JVM it will use it and you 
gave most of it to JVM. Only 4GB is left for OS and disk caches. Since 
swap is enabled, it might swap some JVM pages. It seems to me like 
completely valid scenario. Try running Solr wiith smaller heap and set 
swappines to 1. What OS do you use?


Thanks,
Emir

On 15.12.2015 06:37, Mark Houts wrote:

I am running a SolrCloud 4.6 cluster with three solr nodes and three
external zookeeper nodes. Each Solr node has 12GB RAM. 8GB RAM dedicated to
the JVM.

When solr is started it consumes barely 1GB but over the course of 36 to 48
hours physical memory will be consumed and swap will be used. The i/o
latency of using swap will soon make the machine so slow that it will
become unresponsive.

Has anyone had experience with memory leaks in this version?

Regards,

M Houts



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Partial sentence match with block join

"Apple Computer Company" is a phrase query, meaning all the words must
appear, in that order. Adding phrase slop just allows adjustments to the
location of those words, not making them optional.

If you want them to be optional, then do it as a regular query. If you
want a phrase to score better, then add a phrase version boosted:

"apple computer company"^2 (apple computer company)

Upayavira

On Tue, Dec 15, 2015, at 07:35 AM, Yangrui Guo wrote:
> Hello
> 
> I've been using 5.3.1. I would like to enable this feature: when user
> enters a query, the results should include documents that also partially
> match the query. For example, the document is Apple Company
> and user query is "apple computer company". Though the document is
> missing
> the term "computer". I've tried phrase slop but it doesn't seem to be
> working with block join. How can I do this in solr?
> 
> Thanks
> 
> Yangrui

Solr Basic Configuration - Highlight - Begginer

2015-12-15 Thread Evert R.

Hi there!

It´s my first installation, not sure if here is the right channel...

Here is my steps:

1. Set up a basic install of solr 5.4.0

2. Create a new core through command line (bin/solr create -c test)

3. Post 2 files: 1 .docx and 2 .pdf (bin/post -c test /docs/test/)

4. Query over the browser and it brings the correct search, but it does not
show the part of the text I am querying, the highlight.

  I have already flagled the 'hl' option. But still it does not word...

Exemple: I am looking for the word 'peace' in my pdf file (book) I have 4
matches for this word, it shows me the book name (pdf file) but does not
bring which part of the text it has the word peace on it.


I am problably missing some configuration in schema.xml, which is missing
from my folder /solr/server/solr/test/conf/

Or even the solrconfig.xml...

I have read a bunch of things about highlight check these files, copied the
standard schema.xml to my core/conf folder, but still it does not bring the
highlight.


Attached a copy of my solrconfig.xml file.


I am very sorry for this, probably, dumb and too basic question... First
time I see solr in live.


Any help will be appreciated.



Best regards,



*Evert Ramos*
*evert.ra...@gmail.com *

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

2015-12-15 Thread Mikhail Khludnev

I believe https://issues.apache.org/jira/browse/SOLR-5944 is the most
promising approach for such scenarios.
Despite it's not delivered in distro.
We are going to publish a post about it at blog.griddynamics.com.

FWIW, I suppose EFF can be returned in result list.



On Fri, Dec 11, 2015 at 1:48 PM, Vikram Parmar 
wrote:

> We are creating a web application which would contain posts (something like
> FB or say Youtube). For the stable part of the data (i.e.the facets, search
> results & its content), we plan to use SOLR.
>
> What should we use for the unstable part of the data (i.e. dynamic and
> volatile content such as Like counts, Comments counts, Viewcounts)?
>
>
> Option 1) Redis
>
> What about storing the "dynamic" data in a different data store (like
> Redis)? Thus, everytime the counts get refreshed, I do not have to reindex
> the data into SOLR at all. Thus SOLR indexing is only triggered when new
> posts are added to the site, and never on any activity on the posts by the
> users.
>
> Side-note :-
> I also looked at the SOLR-Redis plugin at
> https://github.com/sematext/solr-redis
>
> The plugin looks good, but not sure if the plugin can be used to fetch the
> data stored in Redis as part of the solr result set, i.e. in docs. The
> description looks more like the Redis data can be used in the function
> queries for boosting, sorting, etc. Anyone has experience with this?
>
>
> Option 2) SOLR NRT with Soft Commits
>
> We would depend on the in-built NRT features. Let's say we do soft-commits
> every second and hard-commits every 10 seconds. Suppose huge amount of
> dynamic data is created on the site across hundreds of posts, e.g. 10
> likes across 1 posts. Thus, this would mean soft-commiting on 1
> rows every second. And then hard-commiting those many rows every 10
> seconds. Isn't this overkill?
>
>
> Which option is preferred? How would you compare both options in terms of
> scalibility, maintenance, feasibility, best-practices, etc? Any real-life
> experiences or links to articles?
>
> Many thanks!
>
>
> p.s. EFF (external file fields) is not an option, as I read that the data
> in that file can only be used in function queries and cannot be returned as
> part of a document.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Issue in Geospatial Search

2015-12-15 Thread davidphilip cherian

What is the fieldType of the field "latlon" in older schema as well as new
schema?
Have you confirmed that both are same?


On Tue, Dec 15, 2015 at 3:18 PM, Shenbagarajan 
wrote:

> Hello,
>
> I am trying to implement geo spatial search in solr by referring the below
> site.
> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
>
> Everytime i try to execute i am getting the same error as below.
>  "msg":"The field latlon does not support spatial filtering",
>
> When i try to run the same query in Solr 4.10 its working fine without any
> issues. But in Solr 5.3 its not working properly. Direct me to figure out
> where the issue is as i am stuck with it:(
>
> Thanks,
> Shen.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Issue-in-Geospatial-Search-tp4245441.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Issue in Geospatial Search

2015-12-15 Thread Shenbagarajan

Below is the configuration in my managedschema.xml






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Issue-in-Geospatial-Search-tp4245441p4245451.html
Sent from the Solr - User mailing list archive at Nabble.com.

solrcloud used a lot of memory and memory keep increasing during long time run

2015-12-15 Thread zhenglingyun

Hi, list

I’m new to solr. Recently I encounter a “memory leak” problem with solrcloud.

I have two 64GB servers running a solrcloud cluster. In the solrcloud, I have
one collection with about 400k docs. The index size of the collection is about
500MB. Memory for solr is 16GB.

Following is "ps aux | grep solr” :

/usr/java/jdk1.7.0_67-cloudera/bin/java 
-Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager 
-Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true 
-Dsolr.hdfs.blockcache.direct.memory.allocation=true 
-Dsolr.hdfs.blockcache.blocksperbank=16384 -Dsolr.hdfs.blockcache.slab.count=1 
-Xms16608395264 -Xmx16608395264 -XX:MaxDirectMemorySize=21590179840 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled 
-XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC 
-Xloggc:/var/log/solr/gc.log 
-XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh 
-DzkHost=bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
 -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr 
-Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
 -Dsolr.authentication.simple.anonymous.allowed=true 
-Dsolr.security.proxyuser.hue.hosts=* -Dsolr.security.proxyuser.hue.groups=* 
-Dhost=bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 
-Dsolr.host=bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983 
-Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
 -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984 
-Dsolr.max.connector.thread=1 -Dsolr.solr.home=/var/lib/solr 
-Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true 
-Dsolr.hdfs.blockcache.direct.memory.allocation=true 
-Dsolr.hdfs.blockcache.blocksperbank=16384 -Dsolr.hdfs.blockcache.slab.count=1 
-Xms16608395264 -Xmx16608395264 -XX:MaxDirectMemorySize=21590179840 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled 
-XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled 
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC 
-Xloggc:/var/log/solr/gc.log 
-XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh 
-DzkHost=bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
 -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr 
-Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
 -Dsolr.authentication.simple.anonymous.allowed=true 
-Dsolr.security.proxyuser.hue.hosts=* -Dsolr.security.proxyuser.hue.groups=* 
-Dhost=bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 
-Dsolr.host=bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983 
-Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
 -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984 
-Dsolr.max.connector.thread=1 -Dsolr.solr.home=/var/lib/solr 
-Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath 
/usr/lib/bigtop-tomcat/bin/bootstrap.jar 
-Dcatalina.base=/var/lib/solr/tomcat-deployment 
-Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/ 
org.apache.catalina.startup.Bootstrap start


solr version is solr4.4.0-cdh5.3.0
jdk version is 1.7.0_67

Soft commit time is 1.5s. And we have real time indexing/partialupdating rate 
about 100 docs per second.

When fresh started, Solr will use about 500M memory(the memory show in solr ui 
panel). 
After several days running, Solr will meet with long time gc problems, and no 
response to user query.

During solr running, the memory used by solr is keep increasing until some 
large value, and decrease to
a low level(because of gc), and keep increasing until a larger value again, 
then decrease to a low level again … and keep
increasing to an more larger value … until solr has no response and i restart 
it.


I don’t know how to solve this problem. Can you give me some advices? 

Thanks.

Re: solrcloud used a lot of memory and memory keep increasing during long time run

2015-12-15 Thread Rahul Ramesh

You should actually decrease solr heap size. Let me explain a bit.

Solr requires very less heap memory for its operation and more memory for
storing data in main memory. This is because solr uses mmap for storing the
index files.
Please check the link
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for
understanding how solr operates on files .

Solr has typical problem of Garbage collection once you the heap size to a
large value. It will have indeterminate pauses due to GC. The amount of
heap memory required is difficult to tell. However the way we tuned this
parameter is setting it to a low value and increasing it by 1Gb whenever
OOM is thrown.

Please check the problem of having large Java Heap

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap


Just for your reference, in our production setup, we have data of around
60Gb/node spread across 25 collections. We have configured 8GB as heap and
the rest of the memory we will leave it to OS to manage. We do around 1000
(search + Insert)/second on the data.

I hope this helps.

Regards,
Rahul



On Tue, Dec 15, 2015 at 4:33 PM, zhenglingyun  wrote:

> Hi, list
>
> I’m new to solr. Recently I encounter a “memory leak” problem with
> solrcloud.
>
> I have two 64GB servers running a solrcloud cluster. In the solrcloud, I
> have
> one collection with about 400k docs. The index size of the collection is
> about
> 500MB. Memory for solr is 16GB.
>
> Following is "ps aux | grep solr” :
>
> /usr/java/jdk1.7.0_67-cloudera/bin/java
> -Djava.util.logging.config.file=/var/lib/solr/tomcat-deployment/conf/logging.properties
> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
> -Dsolr.hdfs.blockcache.blocksperbank=16384
> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/var/log/solr/gc.log
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
> -Dsolr.authentication.simple.anonymous.allowed=true
> -Dsolr.security.proxyuser.hue.hosts=*
> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
> -Dsolr.max.connector.thread=1 -Dsolr.solr.home=/var/lib/solr
> -Djava.net.preferIPv4Stack=true -Dsolr.hdfs.blockcache.enabled=true
> -Dsolr.hdfs.blockcache.direct.memory.allocation=true
> -Dsolr.hdfs.blockcache.blocksperbank=16384
> -Dsolr.hdfs.blockcache.slab.count=1 -Xms16608395264 -Xmx16608395264
> -XX:MaxDirectMemorySize=21590179840 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled
> -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
> -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC
> -Xloggc:/var/log/solr/gc.log
> -XX:OnOutOfMemoryError=/usr/lib64/cmf/service/common/killparent.sh -DzkHost=
> bjzw-datacenter-hadoop-160.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-163.d.yourmall.cc:2181,
> bjzw-datacenter-hadoop-164.d.yourmall.cc:2181/solr
> -Dsolr.solrxml.location=zookeeper -Dsolr.hdfs.home=hdfs://datacenter/solr
> -Dsolr.hdfs.confdir=/var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/hadoop-conf
> -Dsolr.authentication.simple.anonymous.allowed=true
> -Dsolr.security.proxyuser.hue.hosts=*
> -Dsolr.security.proxyuser.hue.groups=* -Dhost=
> bjzw-datacenter-solr-15.d.yourmall.cc -Djetty.port=8983 -Dsolr.host=
> bjzw-datacenter-solr-15.d.yourmall.cc -Dsolr.port=8983
> -Dlog4j.configuration=file:///var/run/cloudera-scm-agent/process/6288-solr-SOLR_SERVER/log4j.properties
> -Dsolr.log=/var/log/solr -Dsolr.admin.port=8984
> -Dsolr.max.connector.thread=1 -Dsolr.solr.home=/var/lib/solr
> -Djava.endorsed.dirs=/usr/lib/bigtop-tomcat/endorsed -classpath
> /usr/lib/bigtop-tomcat/bin/bootstrap.jar
> -Dcatalina.base=/var/lib/solr/tomcat-deployment
> -Dcatalina.home=/usr/lib/bigtop-tomcat -Djava.io.tmpdir=/var/lib/solr/
> org.apache.catalina.startup.Bootstrap start
>
>
> solr version is solr4.4.0-cdh5.3.0
> jdk version is 1.7.0_67
>
> Soft commit time is 1.5s. And we have real time indexing/partialupdating
> rate about 100 docs per second.
>
> When fresh

Re: similarity as a parameter

There should probably be some doc notes about this stuff, at a minimum
alerting the user to the prospect that changing the similarity for a field
(or the default for all fields) can require reindexing and when it is
likely to require reindexing. The Lucene-level Javadoc should probably say
these same things as well.

-- Jack Krupansky

On Tue, Dec 15, 2015 at 2:42 PM, Chris Hostetter 
wrote:

>
> : Sweetspot does require reindexing but is that the only one? I have not
> : investigated some exotic implementations, anyone to confirm sweetspot is
> : the only one? In that case you could patch QueryComponent right, instead
> : of having a custom component?
>
> I'm not sure how where this thread developed this weird assumption that
> switching from/to SweetSpotSimilarity in particular requires reindexing
> but that many/most other Similarities wouldn't require this ...
> SweetSpotSimilarity certainly has explicit config options for tuning the
> index time field norm, but it's not a special case...
>
> 1) Solr shouldn't make any naive assumptions about whatever
> arbitrary (custom) Similarity class a user might provide -- particularly
> when it comes to field norms, since the all of the Similarity base classes
> / callers have been setup to make it trivial for people to write custom
> ismilarities for the express purpose of adjusting how many bits are used
> by field norms.
>
> 2) In both ClassicSimilarity and BM25Similarity (the new default in Solr6)
> the config option "discountOverlaps" impacts what norm values get encoded
> at index time for a given field length -- so it's possible to break things
> w/o even switching what class you use, w/o even consider custom Similarity
> impls (or new out of the box similarity classes that might be added to
> Solr tomorow)
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Partial sentence match with block join

Set the default operator to OR and optionally set the mm parameter to 2 to
require at least two of the query terms to match, and don't quote the terms
as a phrase unless you want an exact (optionally sloppy) match.

Interesting example since I'll bet there are a lot of us who still think of
the company as being named "Apple Computer" even though they dropped
"Computer" from the name back in 2007. Also, it is "Inc.", not "Company",
so a proper search would be for "Apple Inc." or the old "Apple Computer,
Inc."

-- Jack Krupansky

On Tue, Dec 15, 2015 at 2:35 AM, Yangrui Guo  wrote:

> Hello
>
> I've been using 5.3.1. I would like to enable this feature: when user
> enters a query, the results should include documents that also partially
> match the query. For example, the document is Apple Company
> and user query is "apple computer company". Though the document is missing
> the term "computer". I've tried phrase slop but it doesn't seem to be
> working with block join. How can I do this in solr?
>
> Thanks
>
> Yangrui
>

Re: Partial sentence match with block join

2015-12-15 Thread Yangrui Guo

This will be a very common situation. Amazon and Google now display
keywords missing in the document. However it seems that Solr parent-child
structure requires to use "AND" to confine all terms appear inside a single
child document, otherwise it will totally disregard the parent-child
structure. Is there a way to achieve this?

On Tuesday, December 15, 2015, Jack Krupansky 
wrote:

> Set the default operator to OR and optionally set the mm parameter to 2 to
> require at least two of the query terms to match, and don't quote the terms
> as a phrase unless you want an exact (optionally sloppy) match.
>
> Interesting example since I'll bet there are a lot of us who still think of
> the company as being named "Apple Computer" even though they dropped
> "Computer" from the name back in 2007. Also, it is "Inc.", not "Company",
> so a proper search would be for "Apple Inc." or the old "Apple Computer,
> Inc."
>
>
> -- Jack Krupansky
>
> On Tue, Dec 15, 2015 at 2:35 AM, Yangrui Guo  > wrote:
>
> > Hello
> >
> > I've been using 5.3.1. I would like to enable this feature: when user
> > enters a query, the results should include documents that also partially
> > match the query. For example, the document is Apple Company
> > and user query is "apple computer company". Though the document is
> missing
> > the term "computer". I've tried phrase slop but it doesn't seem to be
> > working with block join. How can I do this in solr?
> >
> > Thanks
> >
> > Yangrui
> >
>

Collection API migrate statement

2015-12-15 Thread philippa griggs

Hello,


Solr 5.2.1.


I'm using the collection API migrate statement in our test environment with the 
view to implement a Hot, Cold arrangement- newer documents will be kept on the 
Hot collection and each night the oldest documents will be migrated into the 
Cold collection. I've got it all working with a small amount of documents 
(around 28,000).


I'm now trying to migrate around 200,000 documents and am getting 'migrate the 
collection time out:180s'  message back.


The logs from the source collection are:


INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
org.apache.solr.cloud.OverseerCollectionProcessor; Successfully created replica 
of temp source collection on target leader node
INFO  - 2015-12-15 14:43:19.183; [HotSessions   ] 
org.apache.solr.cloud.OverseerCollectionProcessor; Requesting merge of temp 
source collection replica to target leader
INFO  - 2015-12-15 14:45:36.648; [   ] 
org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeDeleted fired on path 
/overseer/collection-queue-work/qnr-04 state SyncConnected
INFO  - 2015-12-15 14:45:36.651; [   ] 
org.apache.solr.cloud.DistributedQueue$LatchWatcher; NodeChildrenChanged fired 
on path /overseer/collection-queue-work state SyncConnected
ERROR - 2015-12-15 14:45:36.651; [   ] org.apache.solr.common.SolrException; 
org.apache.solr.common.SolrException: migrate the collection time out:180s
at 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:237)
etc


The logs from the target collection are:

INFO  - 2015-12-15 14:43:19.128; [split_shard1_temp_shard2 shard1  
split_shard1_temp_shard2_shard1_replica2] org.apache.solr.update.UpdateLog; 
Took 22 ms to seed version buckets with highest version 1520634636692094979
INFO  - 2015-12-15 14:43:19.129; [split_shard1_temp_shard2 shard1  
split_shard1_temp_shard2_shard1_replica2] 
org.apache.solr.cloud.RecoveryStrategy; Finished recovery process. 
core=split_shard1_temp_shard2_shard1_replica2
INFO  - 2015-12-15 14:43:19.199; [   ] 
org.apache.solr.update.DirectUpdateHandler2; start mergeIndexes{}

As there are no errors in the target collection, am I right in assuming the 
timeout occured because the merge took too long? If that is so, how to I 
increase the timeout period? Ideally I will need to migrate around 2 million 
documents a night.


Any help would be much appreciated.


Philippa

Re: Solr High Availability

Solr Cloud provides HA when you configure at least two replicas for each
shard and have at least 3 zookeepers. That's it. No deck or detail document
is needed.



-- Jack Krupansky

On Tue, Dec 15, 2015 at 9:07 PM,  wrote:

> Hi Team,
>
> Can you help me in understanding in achieving the Solr High Availability .
>
> Appreciate you have a detail document or Deck on more details.
>
> Thank you
> Viswanath Bharathi
> Accenture | Delivery Centres for Technology in India
> CDC 2, Chennai, India
> Mobile: +91 9886259010
> www.accenture.com | www.avanade.com<
> http://www.avanade.com/>
>
>
> 
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> __
>
> www.accenture.com
>

Re: Solr High Availability

2015-12-15 Thread Peter Tan

Hi Jack, What happens when there is only one replica setup?

On Tue, Dec 15, 2015 at 9:32 PM, Jack Krupansky 
wrote:

> Solr Cloud provides HA when you configure at least two replicas for each
> shard and have at least 3 zookeepers. That's it. No deck or detail document
> is needed.
>
>
>
> -- Jack Krupansky
>
> On Tue, Dec 15, 2015 at 9:07 PM, 
> wrote:
>
> > Hi Team,
> >
> > Can you help me in understanding in achieving the Solr High Availability
> .
> >
> > Appreciate you have a detail document or Deck on more details.
> >
> > Thank you
> > Viswanath Bharathi
> > Accenture | Delivery Centres for Technology in India
> > CDC 2, Chennai, India
> > Mobile: +91 9886259010
> > www.accenture.com | www.avanade.com<
> > http://www.avanade.com/>
> >
> >
> > 
> >
> > This message is for the designated recipient only and may contain
> > privileged, proprietary, or otherwise confidential information. If you
> have
> > received it in error, please notify the sender immediately and delete the
> > original. Any other use of the e-mail by you is prohibited. Where allowed
> > by local law, electronic communications with Accenture and its
> affiliates,
> > including e-mail and instant messaging (including content), may be
> scanned
> > by our systems for the purposes of information security and assessment of
> > internal compliance with Accenture policy.
> >
> >
> __
> >
> > www.accenture.com
> >
>

Re: Highlighting large documents

2015-12-15 Thread Zheng Lin Edwin Yeo

Hi all,

Thank you for all the information.

I have set the parameter to  -1, and
the highlighting is working fine now.

Regards,
Edwin


On 14 December 2015 at 18:03, Jens Brandt  wrote:

> Hi Edwin,
>
> you are limiting the portion of the document analyzed for highlighting in
> your solrconfig.xml by
>
>  100
>
> Thus, snippets are only produced correctly if the query was found in the
> first 100 characters of the document.
>
> If you set this parameter to
>
>  -1
>
> the original highlighter uses the whole document to find the snippet.
>
> I hope that helps
>   Jens
>
>
> > Am 04.12.2015 um 16:51 schrieb Zheng Lin Edwin Yeo  >:
> >
> > Hi,
> >
> > I'm using Solr 5.3.0
> >
> > I found that in large documents, sometimes I face situation that when I
> do
> > a highlight query, the resultset that is returned does not contain the
> > highlighted query. There are actually matches in the documents, but just
> > that they located further back in the documents.
> >
> > I have tried to increase the value of the hl.maxAnalyzedChars, as the
> > default value is 51200, and I have documents that are much larger than
> > 51200 characters. Although this method works, but, when I increase this
> > value, the performance of the search and highlight drops. It can drop
> from
> > less than 0.5 seconds to more than 10 seconds.
> >
> > Would like to check, is this method of increasing the value of the
> > hl.maxAnalyzedChars the best method to use, or is there other ways which
> > can solve the same purpose, but without affecting the performance much?
> >
> > Regards,
> > Edwin
>
>

Re: Security Problems

2015-12-15 Thread Jan Høydahl

Yes, that’s why I believe it should be:
1) if only authentication is enabled, all users must authenticate and all 
authenticated users can do anything.
2) if authz is enabled, then all users must still authenticate, and can by 
default do nothing at all, unless assigned proper roles
3) if a user is assigned the default “read” rule, and a collection adds a 
custom “/myselect” handler, that one is unavailable until the user gets it 
assigned

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 14. des. 2015 kl. 14.15 skrev Noble Paul :
> 
> ". If all paths were closed by default, forgetting to configure a path
> would not result in a security breach like today."
> 
> But it will still mean that unauthorized users are able to access,
> like guest being able to post to "/update". Just authenticating is not
> enough without proper authorization
> 
> On Mon, Dec 14, 2015 at 3:59 PM, Jan Høydahl  wrote:
>>> 1) "read" should cover all the paths
>> 
>> This is very fragile. If all paths were closed by default, forgetting to 
>> configure a path would not result in a security breach like today.
>> 
>> /Jan
> 
> 
> 
> -- 
> -
> Noble Paul

Re: Security Problems

I concur - this makes sense.

On Tue, Dec 15, 2015, at 01:39 PM, Jan Høydahl wrote:
> Yes, that’s why I believe it should be:
> 1) if only authentication is enabled, all users must authenticate and all
> authenticated users can do anything.
> 2) if authz is enabled, then all users must still authenticate, and can
> by default do nothing at all, unless assigned proper roles
> 3) if a user is assigned the default “read” rule, and a collection adds a
> custom “/myselect” handler, that one is unavailable until the user gets
> it assigned
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> > 14. des. 2015 kl. 14.15 skrev Noble Paul :
> > 
> > ". If all paths were closed by default, forgetting to configure a path
> > would not result in a security breach like today."
> > 
> > But it will still mean that unauthorized users are able to access,
> > like guest being able to post to "/update". Just authenticating is not
> > enough without proper authorization
> > 
> > On Mon, Dec 14, 2015 at 3:59 PM, Jan Høydahl  wrote:
> >>> 1) "read" should cover all the paths
> >> 
> >> This is very fragile. If all paths were closed by default, forgetting to 
> >> configure a path would not result in a security breach like today.
> >> 
> >> /Jan
> > 
> > 
> > 
> > -- 
> > -
> > Noble Paul
>

Solr High Availability

2015-12-15 Thread k.viswanath.bharathi

Hi Team,

Can you help me in understanding in achieving the Solr High Availability .

Appreciate you have a detail document or Deck on more details.

Thank you
Viswanath Bharathi
Accenture | Delivery Centres for Technology in India
CDC 2, Chennai, India
Mobile: +91 9886259010
www.accenture.com | 
www.avanade.com




This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com

Re: NRT vs Redis for Dynamic Data in SOLR (like counts, viewcounts, etc) -

2015-12-15 Thread Vikram Parmar

Hi Mikhail,

Thanks for chiming in. Looking forward to your post regarding updatable
numeric DocValues.

What would be the 2nd most promising approach for now, would you say EFF
should be ok to go with?

Updating and reloading the EFF external file (containing a millions lines)
at very short intervals is fine? Say every 10 seconds?

Thanks!

On Tue, Dec 15, 2015 at 5:46 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> I believe https://issues.apache.org/jira/browse/SOLR-5944 is the most
> promising approach for such scenarios.
> Despite it's not delivered in distro.
> We are going to publish a post about it at blog.griddynamics.com.
>
> FWIW, I suppose EFF can be returned in result list.
>
>
>
> On Fri, Dec 11, 2015 at 1:48 PM, Vikram Parmar 
> wrote:
>
> > We are creating a web application which would contain posts (something
> like
> > FB or say Youtube). For the stable part of the data (i.e.the facets,
> search
> > results & its content), we plan to use SOLR.
> >
> > What should we use for the unstable part of the data (i.e. dynamic and
> > volatile content such as Like counts, Comments counts, Viewcounts)?
> >
> >
> > Option 1) Redis
> >
> > What about storing the "dynamic" data in a different data store (like
> > Redis)? Thus, everytime the counts get refreshed, I do not have to
> reindex
> > the data into SOLR at all. Thus SOLR indexing is only triggered when new
> > posts are added to the site, and never on any activity on the posts by
> the
> > users.
> >
> > Side-note :-
> > I also looked at the SOLR-Redis plugin at
> > https://github.com/sematext/solr-redis
> >
> > The plugin looks good, but not sure if the plugin can be used to fetch
> the
> > data stored in Redis as part of the solr result set, i.e. in docs. The
> > description looks more like the Redis data can be used in the function
> > queries for boosting, sorting, etc. Anyone has experience with this?
> >
> >
> > Option 2) SOLR NRT with Soft Commits
> >
> > We would depend on the in-built NRT features. Let's say we do
> soft-commits
> > every second and hard-commits every 10 seconds. Suppose huge amount of
> > dynamic data is created on the site across hundreds of posts, e.g. 10
> > likes across 1 posts. Thus, this would mean soft-commiting on 1
> > rows every second. And then hard-commiting those many rows every 10
> > seconds. Isn't this overkill?
> >
> >
> > Which option is preferred? How would you compare both options in terms of
> > scalibility, maintenance, feasibility, best-practices, etc? Any real-life
> > experiences or links to articles?
> >
> > Many thanks!
> >
> >
> > p.s. EFF (external file fields) is not an option, as I read that the data
> > in that file can only be used in function queries and cannot be returned
> as
> > part of a document.
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>

RE: Is DIH going to be removed from Solr future versions?

2015-12-15 Thread Davis, Daniel (NIH/NLM) [C]

I am aware of the problems with the implementation of DIH, but is there any 
problem with the XML driven data import capability?
Could it be rewritten (using modern XPath) to run as a part of SolrJ?

I've been interested in that, but I just haven't been able to shake loose the 
time.

-Original Message-
From: Upayavira [mailto:u...@odoko.co.uk] 
Sent: Tuesday, December 15, 2015 5:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Is DIH going to be removed from Solr future versions?

I doubt DIH will be "removed". It more likely will be relegated - still there, 
but emphasised less.

Another possibility that has been mooted is to extract it, so that it can run 
outside of Solr. This strikes me as the best option. Having it run inside Solr 
strikes me as architecturally wrong, and also problematic in a SolrCloud world. 
Taking the DIH codebase and running it
*outside* Solr you get the best of DIH without the same set of issues.

Upayavira

On Tue, Dec 15, 2015, at 05:47 AM, Anil Cherian wrote:
> Dear Team,
> 
> I use DIH extensively and even wrote my own custom transformers in 
> some situations.
> Recently during an architecture discussion one of my team members told 
> that Solr is going to take away DIH from its future versions.
> 
> Is that true?
> 
> Also is using DIH for say 2 or 3 million docs a good option for 
> indexing an XML content data set. I am planning to use it either by 
> calling separate entities parallely or multiple /dataimport in 
> solrconfig.xml.
> 
> Cld you please reply at your earliest convenience as it is an 
> important decision for us to continue on DIH or not!
> 
> Thanks and Rgds,
> Anil.

similarity as a parameter

2015-12-15 Thread Dmitry Kan

Hi guys,

Is there a way to alter the similarity class at runtime, with a parameter?

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: www.semanticanalyzer.info

Re: Is DIH going to be removed from Solr future versions?

2015-12-15 Thread Erik Hatcher

With time shaken loose, IMO ideally what we do (under 
https://issues.apache.org/jira/browse/SOLR-7188 
 probably) is create an update 
processor that *forwards* to a _real_ Solr collection update handler, and fire 
up EmbeddedSolrServer in a client-side command-line tool that can run 
/update/extract, DIH stuff, etc - does what it does now to extract, parse, and 
build documents and then forwards them via javabin to a live Solr collection.   
I’m not sure that SOLR-7188 currently spells it out like that, but it is a 
nice, clean, straightforward path from DIH and Tika embedded inside a real Solr 
cluster to leveraging and scaling it on its own.   We’d lose the DIH admin UI, 
but that’s ok by me.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 



> On Dec 15, 2015, at 9:23 AM, Davis, Daniel (NIH/NLM) [C] 
>  wrote:
> 
> I am aware of the problems with the implementation of DIH, but is there any 
> problem with the XML driven data import capability?
> Could it be rewritten (using modern XPath) to run as a part of SolrJ?
> 
> I've been interested in that, but I just haven't been able to shake loose the 
> time.
> 
> -Original Message-
> From: Upayavira [mailto:u...@odoko.co.uk] 
> Sent: Tuesday, December 15, 2015 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Is DIH going to be removed from Solr future versions?
> 
> I doubt DIH will be "removed". It more likely will be relegated - still 
> there, but emphasised less.
> 
> Another possibility that has been mooted is to extract it, so that it can run 
> outside of Solr. This strikes me as the best option. Having it run inside 
> Solr strikes me as architecturally wrong, and also problematic in a SolrCloud 
> world. Taking the DIH codebase and running it
> *outside* Solr you get the best of DIH without the same set of issues.
> 
> Upayavira
> 
> On Tue, Dec 15, 2015, at 05:47 AM, Anil Cherian wrote:
>> Dear Team,
>> 
>> I use DIH extensively and even wrote my own custom transformers in 
>> some situations.
>> Recently during an architecture discussion one of my team members told 
>> that Solr is going to take away DIH from its future versions.
>> 
>> Is that true?
>> 
>> Also is using DIH for say 2 or 3 million docs a good option for 
>> indexing an XML content data set. I am planning to use it either by 
>> calling separate entities parallely or multiple /dataimport in 
>> solrconfig.xml.
>> 
>> Cld you please reply at your earliest convenience as it is an 
>> important decision for us to continue on DIH or not!
>> 
>> Thanks and Rgds,
>> Anil.

Re: Solr 5 upgrade

2015-12-15 Thread bharat jangid

richardg  dvdempire.com> writes:

> 
> Ubuntu 14.04.02
> Trying to install solr 5 following this:
> 
https://cwiki.apache.org/confluence/display/solr/Upgrading+a+Solr+4.x+Cl
uster+to+Solr+5.0
> 
> I keep getting "this script requires extracting a war file with either 
the
> jar or unzip utility, please install these utilities or contact your
> administrator for assistance." after running install_solr_service.sh.  
It
> says "Service solr installed." but when I try to run the service I get 
the
> above error.  Not sure the resolution.
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-
5-upgrade-tp4192127.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
>

Please check JAVA_HOME, It seems Java is not installed on machine 
causing this issue. 

You can confirm it by trying java -version in command prompt.

Re: Partial sentence match with block join


Cab you give an example? I cannot understand what you mean from your
description below.

Thx!

On Wed, Dec 16, 2015, at 12:42 AM, Yangrui Guo wrote:
> This will be a very common situation. Amazon and Google now display
> keywords missing in the document. However it seems that Solr parent-child
> structure requires to use "AND" to confine all terms appear inside a
> single
> child document, otherwise it will totally disregard the parent-child
> structure. Is there a way to achieve this?
> 
> On Tuesday, December 15, 2015, Jack Krupansky 
> wrote:
> 
> > Set the default operator to OR and optionally set the mm parameter to 2 to
> > require at least two of the query terms to match, and don't quote the terms
> > as a phrase unless you want an exact (optionally sloppy) match.
> >
> > Interesting example since I'll bet there are a lot of us who still think of
> > the company as being named "Apple Computer" even though they dropped
> > "Computer" from the name back in 2007. Also, it is "Inc.", not "Company",
> > so a proper search would be for "Apple Inc." or the old "Apple Computer,
> > Inc."
> >
> >
> > -- Jack Krupansky
> >
> > On Tue, Dec 15, 2015 at 2:35 AM, Yangrui Guo  > > wrote:
> >
> > > Hello
> > >
> > > I've been using 5.3.1. I would like to enable this feature: when user
> > > enters a query, the results should include documents that also partially
> > > match the query. For example, the document is Apple Company
> > > and user query is "apple computer company". Though the document is
> > missing
> > > the term "computer". I've tried phrase slop but it doesn't seem to be
> > > working with block join. How can I do this in solr?
> > >
> > > Thanks
> > >
> > > Yangrui
> > >
> >

[ANNOUNCE] Apache Solr Ref Guide for v5.4

2015-12-15 Thread Cassandra Targett

The Lucene PMC is pleased to announce the release of the Apache Solr
Reference Guide for Solr 5.4.

This 598 page PDF is the definitive guide for Solr, written and edited by
the Solr committer community. You can download it from:

https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/

- Cassandra

Re: Solr High Availability