Re: sorting using org.apache.solr.client.solrj.SolrQuery not working

2013-09-13 Thread Shawn Heisey
On 9/13/2013 6:56 PM, suren wrote:
> I tried below 3 methods to sort the output from solr 4.3.1., no error and not
> sorting on any given field.
> 1)addSort(field, order)
> 2)addOrUpdateSort(field, order)
> 3)setSort(field, order)
> 
> my schema setting for the fields i tried are
>  multiValued="false"/>
>  multiValued="false"/>
>  multiValued="false"/>
> 
> Any one please tell me why the sorting is not working?

Here's an example of how to do a sort with SolrJ, assuming query is a
SolrQuery object:

query.setSort("LAST_NAM", ORDER.asc);

You'll need this import:

import org.apache.solr.client.solrj.SolrQuery.ORDER;

For the example I've just given, you should see "sort=LAST_NAM asc" in
your solr log in the parameter list for that query.

If that doesn't seem to work, what are you actually seeing?

Thanks,
Shawn



Solr Patent

2013-09-13 Thread Zaizen Ushio
Hello
I have a question about patent.  I believe Apache license is protecting Solr 
developers from patent issue in Solr community.  But is there any case that 
Solr developer or Solr users are alleged by outside of Solr Community?  Is 
there any cases somebody experienced?  Any advice is appreciated.

Thanks,  Zaizen




sorting using org.apache.solr.client.solrj.SolrQuery not working

2013-09-13 Thread suren
I tried below 3 methods to sort the output from solr 4.3.1., no error and not
sorting on any given field.
1)addSort(field, order)
2)addOrUpdateSort(field, order)
3)setSort(field, order)

my schema setting for the fields i tried are




Any one please tell me why the sorting is not working?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/sorting-using-org-apache-solr-client-solrj-SolrQuery-not-working-tp4089985.html
Sent from the Solr - User mailing list archive at Nabble.com.


Early Access Release #7 for Solr 4.x Deep Dive is now available for download on Lulu.com

2013-09-13 Thread Jack Krupansky
Okay, it's hot off the e-presses: my updated book Solr 4.x Deep Dive, Early 
Access Release #7 is now available for purchase and download as an e-book 
for $9.99 on Lulu.com at:


http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21120181.html

(That link says "release-1", but it apparently correctly redirects to EAR 
#7.)


Summary of changes:

* Coverage of Collections API (reference for SolrCloud, but tutorial with 
examples is TBD)

* Coverage of File Access API

Total of 34 pages of additional content.

Please feel free to email or comment on my blog, 
http://basetechnology.blogspot.com/, for any questions or issues related to

the book.

Thanks!

-- Jack Krupansky 



Re: Solr 4.5 spatial search - distance and score

2013-09-13 Thread David Smiley (@MITRE.org)
Hi Weber,

Returning the distance separately from the score is really awkward without
being able to use geodist() (which is coming in Solr 4.5 for the RPT spatial
field).  But as you note in SOLR-4255 it is possible. If you modify the Solr
example schema so that the 'store' spatial field is of type location_rpt,
then your field list ('fl') parameter could do it like this:

fl=*,score,dist:query({!geofilt v='' filter=false score=distance
sfield=store pt=-19.9240936,-43.9373343 d=200})

Here, 'd' isn't in effect but is required, and query() seems to demand 'v'. 
You can probably make the standard geofilt parameters top-level request
parameters and thus share them between this sort and a spatial filter in an
'fq'.  

~ David


Weber wrote
> I'm trying to get score by using a custom boost and also get the distance.
> I found David's code* to get it using "Intersects", which I want to
> replace by {!geofilt} or geodist()
> 
> *David's code: https://issues.apache.org/jira/browse/SOLR-4255
> 
> He told me geodist() will be available again for this kind of field, which
> is a geohash type.
> 
> Then, I'd like to know how it can be done today on 4.4 with {!geofilt} and
> how it will be done on 4.5 using geodist()
> 
> Thanks in advance.





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-5-spatial-search-distance-and-score-tp4089706p4089970.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wrapper for SOLR for Compression

2013-09-13 Thread Chris Hostetter

: I asked this before... But can we add a parameter for SOLR to expose the
: compression modes to solrconfig.xml ?

Bill: note my previous response...

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201304.mbox/%3Calpine.DEB.2.02.1304251901350.3628@frisbee%3E

...because of how the codecs are setup, there isn't anything (i can see) 
that solr could easily do with such a setting to instruct the codec to use 
a diff compression level.

you would need to write a new custom codec that used the compression level 
you want, and configure it that way.

We could probably ship a bunch of CompressingStoredFieldsFormat and 
CompressingTermVectorsFormat variants, each using a diff compression 
level, to make configuration easier -- but it would not be a trivial new 
option to add.


-Hoss


Empty out a multiValue Field using SolrJ

2013-09-13 Thread edo
Hi all,
i am facing an issue while trying to update a document.
I am using SolrJ to add/update documents of my collection. The SolrJ
version is 4.0.0 ( but i also tried with the latest 4.4.0 )

I am aware that in multivalue fields i can only add an element but not
remove, in fact for those fields I am overriding all the values every time.
Everything worked great until I found a corner case: empty out a multivalue
field.
In this case values stay there and the field is not emptied.

I tried to look on the previous ML threads and i found something useful
here:
http://www.searchworkings.org/forum/-/message_boards/view_message/585466

Using CURL i posted a JSON update and the field has been emptied. However
when i tried to apply the same solution using SolrJ i didn't get the
expected result.

This is the code snippet:

 Map element = new HashMap(1);
 element.put("set", null);
 this.addField("fieldname", element);

Am I doing something wrong? Did anyone else face the same issue?

Thanks in advance,
Edo


Committing when indexing in parallel

2013-09-13 Thread Phani Chaitanya

I'm wondering what happens to commit while we are indexing in parallel in
Solr. Are the indexing update requests blocked until the commit finishes ?

Lets say I've a process P1 which issued a commit request and there is
another process P2 which is still indexing to the same index. What happens
to the index in that scenario. Are the P2 indexing requests blocked until P1
commit request finishes ?

I'm just wondering about what is the behavior of Solr in the above case.



-
Phani Chaitanya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Committing-when-indexing-in-parallel-tp4089953.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: "Unable to connect" to "http://localhost:8983/solr/"

2013-09-13 Thread Raheel Hasan
Ok I have solved it my self.. The issue was in "data" directory of
"solr/{myCore}/".. I deleted this folder and it started running again.

however, this is even a bigger issue now, because when the project is LIVE
and it has indexed millions of records, I wont have the option to remove
the "data" folder again.. .

So is there a different solution here? how to save the indexes..



On Fri, Sep 13, 2013 at 11:45 AM, Raheel Hasan wrote:

> ?? anyone?
>
>
> On Thu, Sep 12, 2013 at 8:12 PM, Raheel Hasan 
> wrote:
>
>> Hi,
>>
>> I just have this issue came out of no where
>> Everything was fine until all of a sudden the browser cant connect to
>> this solr.
>>
>>
>> Here is the solr log:
>>
>> INFO  - 2013-09-12 20:07:58.142; org.eclipse.jetty.server.Server;
>> jetty-8.1.8.v20121106
>> INFO  - 2013-09-12 20:07:58.179;
>> org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor
>> E:\Projects\G1\A1\trunk\solr_root\solrization\contexts at interval 0
>> INFO  - 2013-09-12 20:07:58.191;
>> org.eclipse.jetty.deploy.DeploymentManager; Deployable added:
>> E:\Projects\G1\A1\trunk\solr_root\solrization\contexts\solr-jetty-context.xml
>> INFO  - 2013-09-12 20:07:59.159;
>> org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for
>> /solr, did not find org.apache.jasper.servlet.JspServlet
>> INFO  - 2013-09-12 20:07:59.189;
>> org.eclipse.jetty.server.handler.ContextHandler; started
>> o.e.j.w.WebAppContext{/solr,file:/E:/Projects/G1/A1/trunk/solr_root/solrization/solr-webapp/webapp/},E:\Projects\G1\A1\trunk\solr_root\solrization/webapps/solr.war
>> INFO  - 2013-09-12 20:07:59.190;
>> org.eclipse.jetty.server.handler.ContextHandler; started
>> o.e.j.w.WebAppContext{/solr,file:/E:/Projects/G1/A1/trunk/solr_root/solrization/solr-webapp/webapp/},E:\Projects\G1\A1\trunk\solr_root\solrization/webapps/solr.war
>> INFO  - 2013-09-12 20:07:59.206;
>> org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init()
>> INFO  - 2013-09-12 20:07:59.231; org.apache.solr.core.SolrResourceLoader;
>> JNDI not configured for solr (NoInitialContextEx)
>> INFO  - 2013-09-12 20:07:59.231; org.apache.solr.core.SolrResourceLoader;
>> solr home defaulted to 'solr/' (could not find system property or JNDI)
>> INFO  - 2013-09-12 20:07:59.241;
>> org.apache.solr.core.CoreContainer$Initializer; looking for solr config
>> file: E:\Projects\G1\A1\trunk\solr_root\solrization\solr\solr.xml
>> INFO  - 2013-09-12 20:07:59.244; org.apache.solr.core.CoreContainer; New
>> CoreContainer 24012447
>> INFO  - 2013-09-12 20:07:59.244; org.apache.solr.core.CoreContainer;
>> Loading CoreContainer using Solr Home: 'solr/'
>> INFO  - 2013-09-12 20:07:59.245; org.apache.solr.core.SolrResourceLoader;
>> new SolrResourceLoader for directory: 'solr/'
>> INFO  - 2013-09-12 20:07:59.483;
>> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
>> socketTimeout to: 0
>> INFO  - 2013-09-12 20:07:59.484;
>> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
>> urlScheme to: http://
>> INFO  - 2013-09-12 20:07:59.485;
>> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
>> connTimeout to: 0
>> INFO  - 2013-09-12 20:07:59.486;
>> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
>> maxConnectionsPerHost to: 20
>> INFO  - 2013-09-12 20:07:59.487;
>> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
>> corePoolSize to: 0
>> INFO  - 2013-09-12 20:07:59.488;
>> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
>> maximumPoolSize to: 2147483647
>> INFO  - 2013-09-12 20:07:59.489;
>> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
>> maxThreadIdleTime to: 5
>> INFO  - 2013-09-12 20:07:59.490;
>> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
>> sizeOfQueue to: -1
>> INFO  - 2013-09-12 20:07:59.490;
>> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
>> fairnessPolicy to: false
>> INFO  - 2013-09-12 20:07:59.498;
>> org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client,
>> config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
>> INFO  - 2013-09-12 20:07:59.671; org.apache.solr.core.CoreContainer;
>> Registering Log Listener
>> INFO  - 2013-09-12 20:07:59.689; org.apache.solr.core.CoreContainer;
>> Creating SolrCore 'A1' using instanceDir: solr\A1
>> INFO  - 2013-09-12 20:07:59.690; org.apache.solr.core.SolrResourceLoader;
>> new SolrResourceLoader for directory: 'solr\A1\'
>> INFO  - 2013-09-12 20:07:59.724; org.apache.solr.core.SolrConfig; Adding
>> specified lib dirs to ClassLoader
>> INFO  - 2013-09-12 20:07:59.726; org.apache.solr.core.SolrResourceLoader;
>> Adding
>> 'file:/E:/Projects/G1/A1/trunk/solr_root/solrization/lib/mysql-connector-java-5.1.25-bin.jar'
>> to classloader
>> INFO  - 2013-09-12 20:07:59.727; org.apache.solr.core.SolrResourceLoader;
>> Adding
>> 'file:/E:/Projects/G1/A1/trunk/solr_root/contrib/dataimporth

Re: Get the commit time of a document in Solr

2013-09-13 Thread phanichaitanya
Thanks Otis. I'll look into it if I can use it to solve my problem.




-
Phani Chaitanya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089949.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Empty out a multiValue Field using SolrJ

2013-09-13 Thread Chris Hostetter

: i am facing an issue while trying to update a document.
: I am using SolrJ to add/update documents of my collection. The SolrJ

: I am aware that in multivalue fields i can only add an element but not
: remove, in fact for those fields I am overriding all the values every time.
: Everything worked great until I found a corner case: empty out a multivalue
: field.
: In this case values stay there and the field is not emptied.

...

: This is the code snippet:
: 
:  Map element = new HashMap(1);
:  element.put("set", null);
:  this.addField("fieldname", element);

just to be clear:

1) is "this" in your above code a SolrInputDocument object?  are you 
subclassing SolrInputDocument for some reason? can you explain why?

2) IIUC, you are using solrj to fecth a document from solr, and you then 
want to use Atomic updates to modify that document, and one of hte 
modifications you are attempting is to empty out this multivalued field -- 
correct?   But if the field currently has values, then "addField" is going 
to add your 1 element map to that existing list of values.  Have you tried 
using SolrInputDocument.setField("fieldname", element); to *replace* all 
of the existing values with the instruction to set the (on the server 
side) to null?

If i'm missunderstanding any part of your question, please post a more 
complete (ideally: runnably) code example showing everything you are 
doing.


-Hoss


Re: spellcheck causing Core Reload to hang

2013-09-13 Thread Chris Hostetter

: after a lot of investigation today, I found that its the spellcheck
: component which is causing the issue. If its turned off, all will run well
: and core can easily reload. However, when the spellcheck is on, the core
: wont reload instead hang forever.

Can you take some stack traces while the server is hung?

Do you have any firstSearcher or newSearcher warming queries configured?  
If so can you try adding "spellcheck=false" to those warming queries and 
see if it eliminates the problem?

Smells like this thread...
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201309.mbox/%3Calpine.DEB.2.02.1309061149310.10818@frisbee%3E


...would be good to get a jira open with a reproducible set of configs 
that demonstrates the problem semi-reliably..


-Hoss


what does "UnInvertedField; UnInverted multi-valued field" means and how to fix it

2013-09-13 Thread Raheel Hasan
Hi guyz,

I have an issue here in between Solr Core and Data Indexing:

When I build some index from fresh setup, everything is fine: all queries
and additional/update indexing, everything runs is fine. But when I reload
the Core, the solr stops from that point onward forever.

All i get is this line as the last line of the solr log after the issue as
occurred:

UnInvertedField; UnInverted multi-valued field
{field=prod_cited_id,memSize=4880,tindexSize=40,time=4,phase1=4,nTerms=35,bigTerms=4,termInstances=36,uses=0}

Furthermore, the only way to get things working again, would be to delete
the "data" folder inside "solr/{myCore}/"...


So can anyone help me beat this issue and get things working again? I cant
afford this issue when the system is LIVE..

Thanks a lot.

-- 
Regards,
Raheel Hasan


Re: Storing/indexing speed drops quickly

2013-09-13 Thread Shawn Heisey

On 9/13/2013 12:03 AM, Per Steffensen wrote:

What is it that will fill my heap? I am trying to avoid the FieldCache.
For now, I am actually not doing any searches - focus on indexing for
now - and certainly not group/facet/sort searches that will use the
FieldCache.


I don't know what makes up the heap when you have lots of documents.  I 
am not really using any RAM hungry features and I wouldn't be able to 
get away with a 4GB heap on my Solr servers.  Uncollectable (and 
collectable) RAM usage is heaviest during indexing.  I sort on one or 
two fields and we don't use facets.


Here's a screenshot of my index status page showing how big my indexes 
are on each machine, it's a couple of months old now.  These machines 
have a 6GB heap, and I don't dare make it any smaller, or I'll get OOM 
errors during indexing.  They have 64GB total RAM.


https://dl.dropboxusercontent.com/u/97770508/statuspagescreenshot.png


More RAM will probably help, but only for a while. I want billions of
documents in my collections - and also on each machine. Currently we are
aiming 15 billion documents per month (500 million per day) and keep at
least two years of data in the system. Currently we use one collection
for each month, so when the system has been running for two years it
will be 24 collections with 15 billion documents each. Indexing will
only go on in the collection corresponding to the "current" month, but
searching will (potentially) be across all 24 collections. The documents
are very small. I know that 6 machines will not do in the long run -
currently this is only testing - but number of machines should not be
higher than about 20-40. In general it is a problem if Solr/Lucene will
not perform fairly well if data does not fit RAM - then it cannot really
be used for "big data". I would have to buy hundreds or even thousands
of machines with 64GB+ RAM. That is not realistic.


To lower your overall RAM requirements, use SSD, and store as little 
data as possible - only the id used to retrieve data from another 
source, ideally.  That will lower your RAM requirements.  You'll 
probably still want 10-25% of your index size for the disk cache.  With 
regular disks, that's 50-100%.


Put your OS and Solr itself on regular disks in RAID1 and your Solr data 
on the SSD.  Due to the eventual decay caused by writes, SSD will 
eventually die, so be ready for SSD failures to take out shard replicas. 
 So far I'm not aware of any RAID solutions that offer TRIM support, 
and without TRIM support, an SSD eventually has performance problems. 
Without RAID, a failure will take out that replica.  That's one of the 
points of SolrCloud - having replicas so single failures don't bring 
down your index.


If you can't use SSD or get tons of RAM, you're going to have 
performance problems.  Solr (and any other Lucene-based search product) 
does really well with super-large indexes if you have the system 
resources available.  If you don't, it sucks.


Thanks,
Shawn



Re: Best configuration for 2 servers

2013-09-13 Thread Shawn Heisey

On 9/13/2013 12:50 PM, Branham, Jeremy [HR] wrote:

Does this sound appropriate then? [assuming no 3rd server]

Server A:
Zoo Keeper
SOLR with 1 shard

Server B:
SOLR with ZK Host parameter set to Server A


Yes, that will work, but if the ZK on server A goes down, the entire 
cloud is down.


When you create a collection with replicationFactor=2, one replica will 
be on server A and one replica will be on server B.


If you want to break the index up into multiple shards, you can, you'll 
also need the maxShardsPerNode parameter when you create the collection, 
and all shards will have replicas on both machines.


A note about zookeeper and redundancy, and an explanation about why 3 
hosts are required:  To form a quorum, zookeeper must have the votes of 
a majority of the hosts in the ensemble.  If there are only two hosts, 
it's not possible for there to be a majority unless both hosts are up, 
so two hosts is actually worse than one.  You need to either have one ZK 
node or at least three, preferably an odd number.


Thanks,
Shawn



Re: Stop filter changes in Solr >= 4.4

2013-09-13 Thread Yonik Seeley
On Fri, Sep 13, 2013 at 1:07 AM, Shalin Shekhar Mangar
 wrote:
> AFAIk, enablePositionIncrements=false is deprecated in 4.x but not
> removed. It will be removed in 5.0 though.

Hmmm, I had missed that.

Anyone have pointers to an example of what "broken" means and why it
can't be fixed?
It seems pretty extreme just to remove this functionality that has
been possible OOTB for 10 years.

-Yonik
http://lucidworks.com


Re: Different Responses for 4.4 and 3.5 solr index

2013-09-13 Thread Jack Krupansky
I don't have any additional questions, and won't, until you are able to 
supply the information requested in my previous response.


-- Jack Krupansky

-Original Message- 
From: Kuchekar

Sent: Friday, September 13, 2013 1:46 PM
To: solr-user@lucene.apache.org
Subject: Re: Different Responses for 4.4 and 3.5 solr index

Hi,

   Following is the debug query results :

*Solr 3.5*


 true
 60.67038
 sum of:
 
   
 true
 60.67038
 max plus 1.0 times others of:
 
   
 true
 0.44362593
 weight(content:cancer^0.5 in 21506339),
product of:
 
   
 true
 0.009291923
 queryWeight(content:cancer^0.5),
product of:
 
   
 true
 0.5
 boost
   
   
 true
 3.5684927
 idf(docFreq=1682287,
maxDocs=21947370)
   
   
 true
 0.005207758
 queryNorm
   
 
   
   
 true
 47.74318
 fieldWeight(content:cancer in
21506339), product of:
 
   
 true
 13.379088
* tf(termFreq(content:cancer)=179)*
   
   
 true
 3.5684927
 idf(docFreq=1682287,
maxDocs=21947370)
   
   
 true
 1.0
 fieldNorm(field=content,
doc=21506339)
   
 
   
 
   


*Solr 4.4 debug query :*
*
*

 true
 67.04259
 max plus 1.0 times others of:
 
   
 true
 0.75314933
 weight(content:cancer^0.5 in 20543947)
[DefaultSimilarity], result of:
 
   
 true
 0.75314933
 score(doc=20543947,freq=515.0 =
termFreq=515.0 ), product of:
 
   
 true
 0.009295603
 queryWeight, product of:
 
   
 true
 0.5
 boost
   
   
 true
 3.5702603
 idf(docFreq=1678887,
maxDocs=21941764)
   
   
 true
 0.005207241
 queryNorm
   
 
   
   
 true
 81.0221
 fieldWeight in 20543947, product
of:
 
   
 true
 22.693611
 tf(freq=515.0), with freq
of:
 
   
 true
 515.0
 termFreq=515.0
   
 
   
   
 true
 3.5702603
 idf(docFreq=1678887,
maxDocs=21941764)
   
   
 true
 1.0
 fieldNorm(doc=20543947)
   
 
   
 
   
 
   

Search for the term 'cancer' in the field 'content' should me the count to
be 515.

Please let me known if you have any questions or concerns.

Thanks.
Kuchekar, Nilesh


On Fri, Sep 13, 2013 at 12:36 AM, Jack Krupansky 
wrote:



There may be some token filters that are emitting a different number of
terms. There are so many changes between 3.5 and 4.4, that it simply isn't
worth the trouble to track down all of them. In some cases, there may be
bugs in 3.5 that have gotten fixed in any of the intervening releases.

Do you have a specific example - the input text and the field and field
type and analyzer where the tf differs? That should suggest where the
differences come from.

Do you have any specific reason to believe that one of the counts is more
right than the other?

-- Jack Krupansky

-Original Message- From: Kuchekar
Sent: Thursday, September 12, 2013 4:50 PM

To: solr-user@lucene.apache.org
Cc: Stefan Matheis
Subject: Re: Different Responses for 4.4 and 3.5 solr index

Hi,

After triaging more for this, we find that the termFrequency (tf) for
the same field in the same doc in solr 3.5 and 4.4 is different.

example :

If word "fruits" appear in some field for 20 times

In 3.5 tf is reported to be 8, where as in 4.4 solr it reports to be 20.
that is changing the the score.

Also we see that the function 'idf' which depends upon the max doc is
changed.

Are there any changes in 'termFrequency' and 'idf' function in solr 4.4
compared to solr 3.5.

Looking forward for your reply.

Thanks.
Kuchekar, Nilesh


On Thu, Sep 12, 2013 at 11:30 AM, Kuchekar **
wrote:

 Hi,


Any updates on this?. Is ranking computation dependent on the 
'maxDoc'

value in the solr? Is this happening due to changing value of 'maxDoc'
value after each optimization. As in, in solr 4.4 eve

Re: Different Responses for 4.4 and 3.5 solr index

2013-09-13 Thread Kuchekar
Hi,

 The input text = 'cancer', field = 'content', field type
='text_general'.

Analyzer for the filed is as follows :

<
filter class="solr.LowerCaseFilterFactory"/>   




Manual Search count on the term 'cancer' in the field 'content', gave me
515 hits. This makes me believe that tf in 4.4 is correct compared to that
in

Thanks.
Kuchekar, Nilesh


On Fri, Sep 13, 2013 at 2:02 PM, Jack Krupansky wrote:

> I don't have any additional questions, and won't, until you are able to
> supply the information requested in my previous response.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Kuchekar
> Sent: Friday, September 13, 2013 1:46 PM
> To: solr-user@lucene.apache.org
>
> Subject: Re: Different Responses for 4.4 and 3.5 solr index
>
> Hi,
>
>Following is the debug query results :
>
> *Solr 3.5*
>
>
> 
>  true
>  60.67038
>  sum of:
>  
>
>  true
>  60.67038
>  max plus 1.0 times others of:
>  
>
>  true
>  0.44362593
>  weight(**content:cancer^0.5 in 21506339),
> product of:
>  
>
>  true
>  0.009291923
>  **queryWeight(content:cancer^0.**5),
> product of:
>  
>
>  true
>  0.5
>  boost
>
>
>  true
>  3.5684927
>  idf(**docFreq=1682287,
> maxDocs=21947370)
>
>
>  true
>  0.005207758
>  queryNorm
>
>  
>
>
>  true
>  47.74318
>  **fieldWeight(content:cancer in
> 21506339), product of:
>  
>
>  true
>  13.379088
> *  name="description">tf(**termFreq(content:cancer)=179)<**/str>*
>
>
>
>  true
>  3.5684927
>  idf(**docFreq=1682287,
> maxDocs=21947370)
>
>
>  true
>  1.0
>  fieldNorm(**field=content,
> doc=21506339)
>
>  
>
>  
>
>
>
> *Solr 4.4 debug query :*
> *
>
> *
> 
>  true
>  67.04259
>  max plus 1.0 times others of:
>  
>
>  true
>  0.75314933
>  weight(**content:cancer^0.5 in 20543947)
> [DefaultSimilarity], result of:
>  
>
>  true
>  0.75314933
>  score(doc=**20543947,freq=515.0 =
> termFreq=515.0 ), product of:
>  
>
>  true
>  0.009295603
>  **queryWeight, product of:
>  
>
>  true
>  0.5
>  boost
>
>
>  true
>  3.5702603
>  idf(**docFreq=1678887,
> maxDocs=21941764)
>
>
>  true
>  0.005207241
>  queryNorm
>
>  
>
>
>  true
>  81.0221
>  fieldWeight in 20543947, product
> of:
>  
>
>  true
>  22.693611
>  tf(freq=**515.0), with freq
> of:
>  
>
>  true
>  515.0
>  termFreq=**515.0
>
>  
>
>
>  true
>  3.5702603
>  idf(**docFreq=1678887,
> maxDocs=21941764)
>
>
>  true
>  1.0
>  fieldNorm(**doc=20543947)
>
>  
>
>  
>
>  
>
>
> Search for the term 'cancer' in the field 'content' should me the count to
> be 515.
>
> Please let me known if you have any questions or concerns.
>
> Thanks.
> Kuchekar, Nilesh
>
>
> On Fri, Sep 13, 2013 at 12:36 AM, Jack Krupansky 
> **wrote:
>
>  There may be some token filters that are emitting a different number of
>> terms. There are so many changes between 3.5 and 4.4, that it simply isn't
>> worth the trouble to track down all of them. In some cases, there may be
>> bugs in 3.5 that have gotten fixed in any of the intervening releases.
>>
>> Do you have a specific example - the input text and the field and field
>> type and analyzer where the tf differs? That should suggest where the
>> differences come from.
>>
>> Do you have any specific reason to believe that one of the counts is more
>> right than the other?
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Kuchekar
>> Sent: Thursday, September 12, 2013 4:50 PM
>>
>> To: solr-user@lucene.apache.org
>> Cc: Stefan 

Re: spellcheck causing Core Reload to hang

2013-09-13 Thread tamanjit.bin...@yahoo.co.in
Any specific error? Anything in the logs when it hangs?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/spellcheck-causing-Core-Reload-to-hang-tp4089866p4089931.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Best configuration for 2 servers

2013-09-13 Thread Branham, Jeremy [HR]
Thanks Shawn -

Does this sound appropriate then? [assuming no 3rd server]

Server A:
Zoo Keeper
SOLR with 1 shard

Server B:
SOLR with ZK Host parameter set to Server A



Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
Office: +1 (972) 405-2970 | Mobile: +1 (817) 791-1627
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org]
Sent: Friday, September 13, 2013 11:48 AM
To: solr-user@lucene.apache.org
Subject: Re: Best configuration for 2 servers

On 9/13/2013 10:16 AM, Branham, Jeremy [HR] wrote:
> Currently, our SOLR 1.3 installation shares 4 applications servers with other 
> Java apps, leveraging master/slave replication.
>
> To get application isolation, we are moving from SOLR 1.3 to 4.3 and 
> acquiring 2 new production [vm] servers for the migration.
> For the new SOLR configuration, we are considering leveraging SOLR Cloud, but 
> there would be no shard redundancy with only 2 servers.
>
> Are there any good reasons to use a 2 shard cloud setup with no redundancy 
> versus a Master/Slave configuration on SOLR 4.3?

You should go to Solr 4.4, not 4.3.  Version 4.5 will be out soon, so unless 
you're going to go live before 2-3 weeks go by, you should probably plan on 
going with 4.5 instead.  Version 4.5 would be a
*really* good idea if you're going to use SolrCloud, as there are significant 
indexing improvements coming.

With SolrCloud, you can have shard redundancy with two servers.  All shards 
will exist on both servers.  What you won't have with only two servers is 
zookeeper redundancy, and zookeeper is *critical* for SolrCloud.  If you can 
add a third server with minimal CPU/RAM that's just for zookeeper, you can have 
that redundancy with no problem.  This is what I have done for my small cloud 
install.  It's the sort of thing that you could even just throw a desktop 
computer on the network to do, as long as you monitor it really well so you 
know if it ever goes down.

My large main Solr install is two completely independent sharded index copies 
*WITHOUT* SolrCloud.  It's been that way since Solr 3.x because master/slave 
replication is too inflexible.

Thanks,
Shawn





This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.



Federated Search Design Question

2013-09-13 Thread Alejandro Calbazana
Hi,

I have a general design question about federated search that I'd like to
get some thoughts on.

I have several line of business applications that manage their own data.
There is a need to search across these LOB apps, but each of them have
different authorization schemes in terms of allowing users access to data.
None of this data lives in Solr at the moment.

Ideally, everyone would push their data to Solr and we'd rationalize a
common ACL model for authorization.  Everything would be relatively
straightforward.  Unfortunately, I'm not going to be able to solve the ACL
problem in my timeline.

As an alternative, one consideration is to use Solr as soft of a cache
where data is pulled from individual endpoints and stored. A final query
would be made against results stored in Solr for combined results.

Has anyone used Solr in this way?  I understand that this might be an
unusual usage, results are likely going to be thrown away as queries
change, and there is overhead in committing.  If results were pushed into
memory, that might be enough for this purpose.

If there alternatives, I'm opened to suggestion.

Thanks!

Al


changing int to long - does it definitely require a reindex?

2013-09-13 Thread Ty
I messed up.  In my 50+ million document index, I have a field in my
schema.xml that is of type "int".  I should have made it a long; documents
that have a field that overflows that integer aren't being indexed.

Does changing this field type absolutely, positively require a re-index?
 What would happen if I simply changed the data type without re-indexing
old documents?

Thanks,
Ty


Re: Stop filter changes in Solr >= 4.4

2013-09-13 Thread Christopher Condit
Here's the field definition:



















Here's the stack trace:
WARNING: org.apache.solr.client.solrj.SolrServerException:
java.lang.IllegalArgumentException: enablePositionIncrements=false is
not supported anymore as of Lucene 4.4 as it can create broken token
streams
org.apache.solr.client.solrj.SolrServerException:
org.apache.solr.client.solrj.SolrServerException:
java.lang.IllegalArgumentException: enablePositionIncrements=false is
not supported anymore as of Lucene 4.4 as it can create broken token
streams
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:116)
at org.apache.solr.client.solrj.SolrServer.addBean(SolrServer.java:136)
at org.apache.solr.client.solrj.SolrServer.addBean(SolrServer.java:125)
at 
edu.sdsc.nif.vocabulary.VocabularySolrImpl.addTerm(VocabularySolrImpl.java:67)
at 
edu.sdsc.nif.vocabulary.VocabularySolrImplTest.testGetTermFromIdAndProvider(VocabularySolrImplTest.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.lang.IllegalArgumentException: enablePositionIncrements=false is
not supported anymore as of Lucene 4.4 as it can create broken token
streams
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155)
... 32 more
Caused by: java.lang.IllegalArgumentException:
enablePositionIncrements=false is not supported anymore as of Lucene
4.4 as it can create broken token streams
at 
org.apache.lucene.analysis.util.FilteringTokenFilter.checkPositionIncrement(FilteringTokenFilter.java:40)
at 
org.apache.lucene.analysis.util.FilteringTokenFilter.setEnablePositionIncrements(FilteringTokenFilter.java:140)
at 
org.apache.lucene.analysis.core.StopFilterFactory.create(StopFilterFactory.java:88)
at 
org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:67)
at 
org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:66)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:177)
at org.apache.lucene.document.Field.tokenStream(Field.java:552)
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:95)
at 
org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:245)
at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:265)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:212)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(Upda

explicite deltaimports by given ids

2013-09-13 Thread Peter Sch�tt
Hallo,
I want to trigger a deltaimportquery by given IDs.

Example:

query="select oid, att1, att2 from my_table"

deltaImportQuery="select oid, att1, att2 from my_table 
   WHERE oid=${dih.delta.OID}"

deltaQuery="select OID from my_table WHERE
TIME_STAMP > TO_DATE
(${dih.last_index_time:VARCHAR}, '-MM-DD HH24:MI:SS')"

deletedPkQuery="select OID from my_table
   where TIME_STAMP > TO_DATE(${dih.last_index_time:VARCHAR}, '-MM-
DD HH24:MI:SS')"
   

Pseudo URL: 

http://solr-server/solr/mycore/dataimport/?command=deltaImportQuery&&oid=5
&&oid=6

to trigger the update or insert of the datasets with OID in (5, 6).

What is the correct way?

Thanks for any hint.

Ciao
  Peter Schütt




Re: Different Responses for 4.4 and 3.5 solr index

2013-09-13 Thread Kuchekar
Hi,

Following is the debug query results :

*Solr 3.5*


  true
  60.67038
  sum of:
  

  true
  60.67038
  max plus 1.0 times others of:
  

  true
  0.44362593
  weight(content:cancer^0.5 in 21506339),
product of:
  

  true
  0.009291923
  queryWeight(content:cancer^0.5),
product of:
  

  true
  0.5
  boost


  true
  3.5684927
  idf(docFreq=1682287,
maxDocs=21947370)


  true
  0.005207758
  queryNorm

  


  true
  47.74318
  fieldWeight(content:cancer in
21506339), product of:
  

  true
  13.379088
 * tf(termFreq(content:cancer)=179)*


  true
  3.5684927
  idf(docFreq=1682287,
maxDocs=21947370)


  true
  1.0
  fieldNorm(field=content,
doc=21506339)

  

  



*Solr 4.4 debug query :*
*
*

  true
  67.04259
  max plus 1.0 times others of:
  

  true
  0.75314933
  weight(content:cancer^0.5 in 20543947)
[DefaultSimilarity], result of:
  

  true
  0.75314933
  score(doc=20543947,freq=515.0 =
termFreq=515.0 ), product of:
  

  true
  0.009295603
  queryWeight, product of:
  

  true
  0.5
  boost


  true
  3.5702603
  idf(docFreq=1678887,
maxDocs=21941764)


  true
  0.005207241
  queryNorm

  


  true
  81.0221
  fieldWeight in 20543947, product
of:
  

  true
  22.693611
  tf(freq=515.0), with freq
of:
  

  true
  515.0
  termFreq=515.0

  


  true
  3.5702603
  idf(docFreq=1678887,
maxDocs=21941764)


  true
  1.0
  fieldNorm(doc=20543947)

  

  

  


Search for the term 'cancer' in the field 'content' should me the count to
be 515.

Please let me known if you have any questions or concerns.

Thanks.
Kuchekar, Nilesh


On Fri, Sep 13, 2013 at 12:36 AM, Jack Krupansky wrote:

> There may be some token filters that are emitting a different number of
> terms. There are so many changes between 3.5 and 4.4, that it simply isn't
> worth the trouble to track down all of them. In some cases, there may be
> bugs in 3.5 that have gotten fixed in any of the intervening releases.
>
> Do you have a specific example - the input text and the field and field
> type and analyzer where the tf differs? That should suggest where the
> differences come from.
>
> Do you have any specific reason to believe that one of the counts is more
> right than the other?
>
> -- Jack Krupansky
>
> -Original Message- From: Kuchekar
> Sent: Thursday, September 12, 2013 4:50 PM
>
> To: solr-user@lucene.apache.org
> Cc: Stefan Matheis
> Subject: Re: Different Responses for 4.4 and 3.5 solr index
>
> Hi,
>
> After triaging more for this, we find that the termFrequency (tf) for
> the same field in the same doc in solr 3.5 and 4.4 is different.
>
> example :
>
> If word "fruits" appear in some field for 20 times
>
> In 3.5 tf is reported to be 8, where as in 4.4 solr it reports to be 20.
> that is changing the the score.
>
> Also we see that the function 'idf' which depends upon the max doc is
> changed.
>
> Are there any changes in 'termFrequency' and 'idf' function in solr 4.4
> compared to solr 3.5.
>
> Looking forward for your reply.
>
> Thanks.
> Kuchekar, Nilesh
>
>
> On Thu, Sep 12, 2013 at 11:30 AM, Kuchekar **
> wrote:
>
>  Hi,
>>
>> Any updates on this?. Is ranking computation dependent on the 'maxDoc'
>> value in the solr? Is this happening due to changing value of 'maxDoc'
>> value after each optimization. As in, in solr 4.4 every time optimization
>> is ran, the 'maxDoc' value is reset, where as this is not the case in solr
>> 3.5.
>>
>> L

Solr wildcard search

2013-09-13 Thread Prasi S
Hi all,
I am working with wildcard queries and few things are confusing.

1. Does a wildcard search omit the analysers on a particular field?

2. I have searched for
q=google\ technology - >gives result
q=google technology -> Gives results
q=google tech*   -> gives results
q=google\ tech* -> 0 results. The debug Query for the last query is text:google tech*

Why does this happen.


Thanks,
Prasi


Best configuration for 2 servers

2013-09-13 Thread Branham, Jeremy [HR]
Currently, our SOLR 1.3 installation shares 4 applications servers with other 
Java apps, leveraging master/slave replication.

To get application isolation, we are moving from SOLR 1.3 to 4.3 and acquiring 
2 new production [vm] servers for the migration.
For the new SOLR configuration, we are considering leveraging SOLR Cloud, but 
there would be no shard redundancy with only 2 servers.

Are there any good reasons to use a 2 shard cloud setup with no redundancy 
versus a Master/Slave configuration on SOLR 4.3?

Thanks!



Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.


Re: Solr wildcard search

2013-09-13 Thread Jack Krupansky
Wildcard applies only to a single term. The escaped space suggests that you 
are trying to match a wildcard on multiple terms.


Try the contrib complex phrase query parser.

-- Jack Krupansky

-Original Message- 
From: Prasi S

Sent: Friday, September 13, 2013 6:37 AM
To: solr-user@lucene.apache.org
Subject: Solr wildcard search

Hi all,
I am working with wildcard queries and few things are confusing.

1. Does a wildcard search omit the analysers on a particular field?

2. I have searched for
q=google\ technology - >gives result
q=google technology -> Gives results
q=google tech*   -> gives results
q=google\ tech* -> 0 results. The debug Query for the last query is text:google tech*

Why does this happen.


Thanks,
Prasi 



spellcheck causing Core Reload to hang

2013-09-13 Thread Raheel Hasan
Hi,

after a lot of investigation today, I found that its the spellcheck
component which is causing the issue. If its turned off, all will run well
and core can easily reload. However, when the spellcheck is on, the core
wont reload instead hang forever.

Then the only way to get the project back alive is to stop solr, and delete
the data folder then start solr again.

Here are the solr config settings for spell check:



   
   default
   on
   5
   false
   5
   2
   false

   true
   3
   3
   true


 
   spellcheck
 





text_en_splitting


  default
  location_details
  solr.DirectSolrSpellChecker
  true
  0.5
  .01
  1
  3
  3
  4
  0.001


  


Here is the field from schema:




-- 
Regards,
Raheel Hasan


Re: Escaping *, ? in Solr

2013-09-13 Thread Jack Krupansky
Asterisk and question mark are wildcards, not regex. Regex query is a 
regular expression enclosed in slashes, such as:


q=/Googl.*/

And note that not all analyzer filters will be applied to regex terms. You 
may need to do the analysis yourself. Although simple filters likethe lower 
case filter should work fine.


-- Jack Krupansky

-Original Message- 
From: Prasi S

Sent: Friday, September 13, 2013 3:56 AM
To: solr-user@lucene.apache.org
Subject: Escaping *, ? in Solr

Hi,
I want to do regex search in solr.

E.g: Googl* . In my query api, i have used the ClientUtils.escapeQueryChars
funtion to escape characters special to solr.

In the above case, a search for
1. Google -> gives 677 records.
2. Googl* -> Escaped as Googl\* in code-> gives 12 results
3. When given q=Google* directly in the Browser -> gives 677 records.

Which is correct if I want to achieve regex search ( Googl*). Should i
restrict from escaping *, ? in the code for handling regex?

Pls suggest.

Thanks,
Prasi. 



Re: Best configuration for 2 servers

2013-09-13 Thread Shawn Heisey

On 9/13/2013 10:16 AM, Branham, Jeremy [HR] wrote:

Currently, our SOLR 1.3 installation shares 4 applications servers with other 
Java apps, leveraging master/slave replication.

To get application isolation, we are moving from SOLR 1.3 to 4.3 and acquiring 
2 new production [vm] servers for the migration.
For the new SOLR configuration, we are considering leveraging SOLR Cloud, but 
there would be no shard redundancy with only 2 servers.

Are there any good reasons to use a 2 shard cloud setup with no redundancy 
versus a Master/Slave configuration on SOLR 4.3?


You should go to Solr 4.4, not 4.3.  Version 4.5 will be out soon, so 
unless you're going to go live before 2-3 weeks go by, you should 
probably plan on going with 4.5 instead.  Version 4.5 would be a 
*really* good idea if you're going to use SolrCloud, as there are 
significant indexing improvements coming.


With SolrCloud, you can have shard redundancy with two servers.  All 
shards will exist on both servers.  What you won't have with only two 
servers is zookeeper redundancy, and zookeeper is *critical* for 
SolrCloud.  If you can add a third server with minimal CPU/RAM that's 
just for zookeeper, you can have that redundancy with no problem.  This 
is what I have done for my small cloud install.  It's the sort of thing 
that you could even just throw a desktop computer on the network to do, 
as long as you monitor it really well so you know if it ever goes down.


My large main Solr install is two completely independent sharded index 
copies *WITHOUT* SolrCloud.  It's been that way since Solr 3.x because 
master/slave replication is too inflexible.


Thanks,
Shawn



Re: Escaping *, ? in Solr

2013-09-13 Thread Shawn Heisey

On 9/13/2013 1:56 AM, Prasi S wrote:

I want to do regex search in solr.

E.g: Googl* . In my query api, i have used the ClientUtils.escapeQueryChars
funtion to escape characters special to solr.

In the above case, a search for
1. Google -> gives 677 records.
2. Googl* -> Escaped as Googl\* in code-> gives 12 results
3. When given q=Google* directly in the Browser -> gives 677 records.

Which is correct if I want to achieve regex search ( Googl*). Should i
restrict from escaping *, ? in the code for handling regex?


Your third example is using * as a wildcard.  That's NOT the same thing 
as regex.


If you sent q=/Google.*/ then that would be a regex that should do the 
same thing as your wildcard example.  This requires Solr 4.0 or later.


You can't use the escapeQueryChars method if you're wanting to do regex 
or wildcard search.  The point behind that escape method is to search 
for special characters rather than let them have their special meanings.


Thanks,
Shawn



Re: "Unable to connect" to "http://localhost:8983/solr/"

2013-09-13 Thread Raheel Hasan
You are right sir, its weird to have no error in the log... So after a full
day spent only on trying to figure this out, I have found the cause
(spellcheck component)... but not the solution.

Se my other post with the subject "*spellcheck causing Core Reload to hang*".
I have explained it there.

Thanks a lot.



On Fri, Sep 13, 2013 at 9:24 PM, Shawn Heisey  wrote:

> On 9/13/2013 5:47 AM, Raheel Hasan wrote:
>
>> Ok I have solved it my self.. The issue was in "data" directory of
>> "solr/{myCore}/".. I deleted this folder and it started running again.
>>
>> however, this is even a bigger issue now, because when the project is LIVE
>> and it has indexed millions of records, I wont have the option to remove
>> the "data" folder again.. .
>>
>> So is there a different solution here? how to save the indexes..
>>
>
> The log you provided didn't have any error or warn messages in it, so
> there's no clue about what went wrong.
>
> If you have to delete the data directory, it usually means that your index
> is corrupt, you've changed the schema in a way that's completely
> incompatible with the existing index, or something else has gone very
> wrong.  It's very weird that there's no error message in the log, though -
> problems like that typically have an error message with a long Java
> stacktrace.
>
> Thanks,
> Shawn
>
>


-- 
Regards,
Raheel Hasan


RE: Best configuration for 2 servers

2013-09-13 Thread Branham, Jeremy [HR]
Or another possible scenario -

SOLR Cloud with one logical shard and 2 servers would give me replication 
without the master/slave setup.
Is that correct?



Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham


-Original Message-
From: Branham, Jeremy [HR]
Sent: Friday, September 13, 2013 11:17 AM
To: SOLR User distro (solr-user@lucene.apache.org)
Subject: Best configuration for 2 servers

Currently, our SOLR 1.3 installation shares 4 applications servers with other 
Java apps, leveraging master/slave replication.

To get application isolation, we are moving from SOLR 1.3 to 4.3 and acquiring 
2 new production [vm] servers for the migration.
For the new SOLR configuration, we are considering leveraging SOLR Cloud, but 
there would be no shard redundancy with only 2 servers.

Are there any good reasons to use a 2 shard cloud setup with no redundancy 
versus a Master/Slave configuration on SOLR 4.3?

Thanks!



Jeremy D. Branham
Performance Technologist II
Sprint University Performance Support
Fort Worth, TX | Tel: **DOTNET
http://JeremyBranham.Wordpress.com
http://www.linkedin.com/in/jeremybranham




This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.



This e-mail may contain Sprint proprietary information intended for the sole 
use of the recipient(s). Any use by others is prohibited. If you are not the 
intended recipient, please contact the sender and delete all copies of the 
message.



Re: "Unable to connect" to "http://localhost:8983/solr/"

2013-09-13 Thread Shawn Heisey

On 9/13/2013 5:47 AM, Raheel Hasan wrote:

Ok I have solved it my self.. The issue was in "data" directory of
"solr/{myCore}/".. I deleted this folder and it started running again.

however, this is even a bigger issue now, because when the project is LIVE
and it has indexed millions of records, I wont have the option to remove
the "data" folder again.. .

So is there a different solution here? how to save the indexes..


The log you provided didn't have any error or warn messages in it, so 
there's no clue about what went wrong.


If you have to delete the data directory, it usually means that your 
index is corrupt, you've changed the schema in a way that's completely 
incompatible with the existing index, or something else has gone very 
wrong.  It's very weird that there's no error message in the log, though 
- problems like that typically have an error message with a long Java 
stacktrace.


Thanks,
Shawn



Escaping *, ? in Solr

2013-09-13 Thread Prasi S
Hi,
I want to do regex search in solr.

E.g: Googl* . In my query api, i have used the ClientUtils.escapeQueryChars
funtion to escape characters special to solr.

In the above case, a search for
1. Google -> gives 677 records.
2. Googl* -> Escaped as Googl\* in code-> gives 12 results
3. When given q=Google* directly in the Browser -> gives 677 records.

Which is correct if I want to achieve regex search ( Googl*). Should i
restrict from escaping *, ? in the code for handling regex?

Pls suggest.

Thanks,
Prasi.


Re: Profiling Solr Lucene for query

2013-09-13 Thread Dmitry Kan
Manuel,

Whether to have the front end solr as aggregator of shard results depends
on your requirements. To repeat, we found merging from many shards very
inefficient fo our use case. It can be the opposite for you (i.e. requires
testing). There are some limitations with distributed search, see here:
http://docs.lucidworks.com/display/solr/Distributed+Search+with+Index+Sharding


On Wed, Sep 11, 2013 at 3:35 PM, Manuel Le Normand <
manuel.lenorm...@gmail.com> wrote:

> Dmitry - currently we don't have such a front end, this sounds like a good
> idea creating it. And yes, we do query all 36 shards every query.
>
> Mikhail - I do think 1 minute is enough data, as during this exact minute I
> had a single query running (that took a qtime of 1 minute). I wanted to
> isolate these hard queries. I repeated this profiling few times.
>
> I think I will take the termInterval from 128 to 32 and check the results.
> I'm currently using NRTCachingDirectoryFactory
>
>
>
>
> On Mon, Sep 9, 2013 at 11:29 PM, Dmitry Kan  wrote:
>
> > Hi Manuel,
> >
> > The frontend solr instance is the one that does not have its own index
> and
> > is doing merging of the results. Is this the case? If yes, are all 36
> > shards always queried?
> >
> > Dmitry
> >
> >
> > On Mon, Sep 9, 2013 at 10:11 PM, Manuel Le Normand <
> > manuel.lenorm...@gmail.com> wrote:
> >
> > > Hi Dmitry,
> > >
> > > I have solr 4.3 and every query is distributed and merged back for
> > ranking
> > > purpose.
> > >
> > > What do you mean by frontend solr?
> > >
> > >
> > > On Mon, Sep 9, 2013 at 2:12 PM, Dmitry Kan 
> wrote:
> > >
> > > > are you querying your shards via a frontend solr? We have noticed,
> that
> > > > querying becomes much faster if results merging can be avoided.
> > > >
> > > > Dmitry
> > > >
> > > >
> > > > On Sun, Sep 8, 2013 at 6:56 PM, Manuel Le Normand <
> > > > manuel.lenorm...@gmail.com> wrote:
> > > >
> > > > > Hello all
> > > > > Looking on the 10% slowest queries, I get very bad performances
> (~60
> > > sec
> > > > > per query).
> > > > > These queries have lots of conditions on my main field (more than a
> > > > > hundred), including phrase queries and rows=1000. I do return only
> > id's
> > > > > though.
> > > > > I can quite firmly say that this bad performance is due to slow
> > storage
> > > > > issue (that are beyond my control for now). Despite this I want to
> > > > improve
> > > > > my performances.
> > > > >
> > > > > As tought in school, I started profiling these queries and the data
> > of
> > > ~1
> > > > > minute profile is located here:
> > > > >
> http://picpaste.com/pics/IMG_20130908_132441-ZyrfXeTY.1378637843.jpg
> > > > >
> > > > > Main observation: most of the time I do wait for readVInt, who's
> > > > stacktrace
> > > > > (2 out of 2 thread dumps) is:
> > > > >
> > > > > catalina-exec-3870 - Thread t@6615
> > > > >  java.lang.Thread.State: RUNNABLE
> > > > >  at org.apadhe.lucene.store.DataInput.readVInt(DataInput.java:108)
> > > > >  at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apaChe.lucene.codeosAockTreeIermsReade$FieldReader$SegmentTermsEnumFrame.loadBlock(BlockTreeTermsReader.java:
> > > > > 2357)
> > > > >  at
> > > > >
> > > > >
> > > >
> > >
> >
> ora.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BlockTreeTermsReader.java:1745)
> > > > >  at org.apadhe.lucene.index.TermContext.build(TermContext.java:95)
> > > > >  at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.lucene.search.PhraseQuery$PhraseWeight.(PhraseQuery.java:221)
> > > > >  at
> > > >
> org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:326)
> > > > >  at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.lucene.search.BooleanQuery$BooleanWeight.(BooleanQuery.java:183)
> > > > >  at
> > > > >
> > >
> org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
> > > > >  at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.lucene.searth.BooleanQuery$BooleanWeight.(BooleanQuery.java:183)
> > > > >  at
> > > > >
> > >
> oro.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
> > > > >  at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.lucene.searth.BooleanQuery$BooleanWeight.(BooleanQuery.java:183)
> > > > >  at
> > > > >
> > >
> org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:384)
> > > > >  at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:675)
> > > > >  at
> > > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297)
> > > > >
> > > > >
> > > > > So I do actually wait for IO as expected, but I might be too many
> > time
> > > > page
> > > > > faulting while looking for the TermBlocks (tim file), ie locating
> the
> > > > term.
> > > > > As I reindex now, would it be useful lowering down the termInterval
> > > > > (default to 128)? As the FST (tip files) are that small (few 10-100
> > MB)
> > > > so
> > > > > there are no memory contentions, c

Re: Re: Unable to getting started with SOLR

2013-09-13 Thread Rah1x
I have the same issue can anyone tell me if they found a solution?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-getting-started-with-SOLR-tp3497276p4089761.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: "Unable to connect" to "http://localhost:8983/solr/"

2013-09-13 Thread Raheel Hasan
?? anyone?


On Thu, Sep 12, 2013 at 8:12 PM, Raheel Hasan wrote:

> Hi,
>
> I just have this issue came out of no where
> Everything was fine until all of a sudden the browser cant connect to this
> solr.
>
>
> Here is the solr log:
>
> INFO  - 2013-09-12 20:07:58.142; org.eclipse.jetty.server.Server;
> jetty-8.1.8.v20121106
> INFO  - 2013-09-12 20:07:58.179;
> org.eclipse.jetty.deploy.providers.ScanningAppProvider; Deployment monitor
> E:\Projects\G1\A1\trunk\solr_root\solrization\contexts at interval 0
> INFO  - 2013-09-12 20:07:58.191;
> org.eclipse.jetty.deploy.DeploymentManager; Deployable added:
> E:\Projects\G1\A1\trunk\solr_root\solrization\contexts\solr-jetty-context.xml
> INFO  - 2013-09-12 20:07:59.159;
> org.eclipse.jetty.webapp.StandardDescriptorProcessor; NO JSP Support for
> /solr, did not find org.apache.jasper.servlet.JspServlet
> INFO  - 2013-09-12 20:07:59.189;
> org.eclipse.jetty.server.handler.ContextHandler; started
> o.e.j.w.WebAppContext{/solr,file:/E:/Projects/G1/A1/trunk/solr_root/solrization/solr-webapp/webapp/},E:\Projects\G1\A1\trunk\solr_root\solrization/webapps/solr.war
> INFO  - 2013-09-12 20:07:59.190;
> org.eclipse.jetty.server.handler.ContextHandler; started
> o.e.j.w.WebAppContext{/solr,file:/E:/Projects/G1/A1/trunk/solr_root/solrization/solr-webapp/webapp/},E:\Projects\G1\A1\trunk\solr_root\solrization/webapps/solr.war
> INFO  - 2013-09-12 20:07:59.206;
> org.apache.solr.servlet.SolrDispatchFilter; SolrDispatchFilter.init()
> INFO  - 2013-09-12 20:07:59.231; org.apache.solr.core.SolrResourceLoader;
> JNDI not configured for solr (NoInitialContextEx)
> INFO  - 2013-09-12 20:07:59.231; org.apache.solr.core.SolrResourceLoader;
> solr home defaulted to 'solr/' (could not find system property or JNDI)
> INFO  - 2013-09-12 20:07:59.241;
> org.apache.solr.core.CoreContainer$Initializer; looking for solr config
> file: E:\Projects\G1\A1\trunk\solr_root\solrization\solr\solr.xml
> INFO  - 2013-09-12 20:07:59.244; org.apache.solr.core.CoreContainer; New
> CoreContainer 24012447
> INFO  - 2013-09-12 20:07:59.244; org.apache.solr.core.CoreContainer;
> Loading CoreContainer using Solr Home: 'solr/'
> INFO  - 2013-09-12 20:07:59.245; org.apache.solr.core.SolrResourceLoader;
> new SolrResourceLoader for directory: 'solr/'
> INFO  - 2013-09-12 20:07:59.483;
> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
> socketTimeout to: 0
> INFO  - 2013-09-12 20:07:59.484;
> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
> urlScheme to: http://
> INFO  - 2013-09-12 20:07:59.485;
> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
> connTimeout to: 0
> INFO  - 2013-09-12 20:07:59.486;
> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
> maxConnectionsPerHost to: 20
> INFO  - 2013-09-12 20:07:59.487;
> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
> corePoolSize to: 0
> INFO  - 2013-09-12 20:07:59.488;
> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
> maximumPoolSize to: 2147483647
> INFO  - 2013-09-12 20:07:59.489;
> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
> maxThreadIdleTime to: 5
> INFO  - 2013-09-12 20:07:59.490;
> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
> sizeOfQueue to: -1
> INFO  - 2013-09-12 20:07:59.490;
> org.apache.solr.handler.component.HttpShardHandlerFactory; Setting
> fairnessPolicy to: false
> INFO  - 2013-09-12 20:07:59.498;
> org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client,
> config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
> INFO  - 2013-09-12 20:07:59.671; org.apache.solr.core.CoreContainer;
> Registering Log Listener
> INFO  - 2013-09-12 20:07:59.689; org.apache.solr.core.CoreContainer;
> Creating SolrCore 'A1' using instanceDir: solr\A1
> INFO  - 2013-09-12 20:07:59.690; org.apache.solr.core.SolrResourceLoader;
> new SolrResourceLoader for directory: 'solr\A1\'
> INFO  - 2013-09-12 20:07:59.724; org.apache.solr.core.SolrConfig; Adding
> specified lib dirs to ClassLoader
> INFO  - 2013-09-12 20:07:59.726; org.apache.solr.core.SolrResourceLoader;
> Adding
> 'file:/E:/Projects/G1/A1/trunk/solr_root/solrization/lib/mysql-connector-java-5.1.25-bin.jar'
> to classloader
> INFO  - 2013-09-12 20:07:59.727; org.apache.solr.core.SolrResourceLoader;
> Adding
> 'file:/E:/Projects/G1/A1/trunk/solr_root/contrib/dataimporthandler/lib/activation-1.1.jar'
> to classloader
> INFO  - 2013-09-12 20:07:59.727; org.apache.solr.core.SolrResourceLoader;
> Adding
> 'file:/E:/Projects/G1/A1/trunk/solr_root/contrib/dataimporthandler/lib/mail-1.4.1.jar'
> to classloader
> INFO  - 2013-09-12 20:07:59.728; org.apache.solr.core.SolrResourceLoader;
> Adding
> 'file:/E:/Projects/G1/A1/trunk/solr_root/dist/solr-dataimporthandler-4.3.0.jar'
> to classloader
> INFO  - 2013-09-12 20:07:59.729; org.apache.solr.core.SolrResourceLoader;
> Adding
> 'file:/E:/Projects/G1/A1/

Re: Facet counting empty as well.. how to prevent this?

2013-09-13 Thread Upayavira
The simplest thing is to exclude empty values in the query: myfield:[*
TO *]

Upayavira

On Thu, Sep 12, 2013, at 03:50 PM, Raheel Hasan wrote:
> ok, so I got the idea... I will pull 7 fields instead and remove the
> empty
> one...
> 
> But there must be some setting that can be done in Facet configuration to
> ignore certain value if we want to
> 
> 
> On Thu, Sep 12, 2013 at 7:44 PM, Shawn Heisey  wrote:
> 
> > On 9/12/2013 7:54 AM, Raheel Hasan wrote:
> > > I got a small issue here, my facet settings are returning counts for
> > empty
> > > "". I.e. when no the actual field was empty.
> > >
> > > Here are the facet settings:
> > >
> > > count
> > > 6
> > > 1
> > > false
> > >
> > > and this is the part of the result I dont want:
> > > 4
> >
> > The "facet.missing" parameter has to do with whether or not to display
> > counts for documents that have no value at all for that field.
> >
> > Even though it might seem wrong, the empty string is a valid value, so
> > you can't fix this with faceting parameters.  If you don't want that to
> > be in your index, then you can add the LengthFilterFactory to your
> > analyzer to remove terms with a length less than 1.  You might also
> > check to see whether the field definition in your schema has a default
> > value set to the empty string.
> >
> > If you are using DocValues (Solr 4.2 and later), then the indexed terms
> > aren't used for facets, and it won't matter what you do to your analysis
> > chain.  With DocValues, Solr basically uses a value equivalent to the
> > stored value.  To get rid of the empty string with DocValues, you'll
> > need to either change your indexing process so it doesn't send empty
> > strings, or use a custom UpdateProcessor to change the data before it
> > gets indexed.
> >
> > Thanks,
> > Shawn
> >
> >
> 
> 
> -- 
> Regards,
> Raheel Hasan


Re: Solr 4.5 spatial search - distance and score

2013-09-13 Thread Bill Bell
You can apply his 4.5 patches to 4.4 or take trunk and it is there

Bill Bell
Sent from mobile


On Sep 12, 2013, at 6:23 PM, Weber  wrote:

> I'm trying to get score by using a custom boost and also get the distance. I
> found David's code* to get it using "Intersects", which I want to replace by
> {!geofilt} or geodist()
> 
> *David's code: https://issues.apache.org/jira/browse/SOLR-4255
> 
> He told me geodist() will be available again for this kind of field, which
> is a geohash type.
> 
> Then, I'd like to know how it can be done today on 4.4 with {!geofilt} and
> how it will be done on 4.5 using geodist()
> 
> Thanks in advance.
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-4-5-spatial-search-distance-and-score-tp4089706.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Storing/indexing speed drops quickly

2013-09-13 Thread Per Steffensen

On 9/12/13 4:26 PM, Shawn Heisey wrote:

On 9/12/2013 2:14 AM, Per Steffensen wrote:

Starting from an empty collection. Things are fine wrt
storing/indexing speed for the first two-three hours (100M docs per
hour), then speed goes down dramatically, to an, for us, unacceptable
level (max 10M per hour). At the same time as speed goes down, we see
that I/O wait increases dramatically. I am not 100% sure, but quick
investigation has shown that this is due to almost constant merging.

While constant merging is contributing to the slowdown, I would guess
that your index is simply too big for the amount of RAM that you have.
Let's ignore for a minute that you're distributed and just concentrate
on one machine.

After three hours of indexing, you have nearly 300 million documents.
If you have a replicationFactor of 1, that's still 50 million documents
per machine.  If your replicationFactor is 2, you've got 100 million
documents per machine.  Let's focus on the smaller number for a minute.
replicationFactor is 1, so that is about 50 million docs per machine at 
this point


50 million documents in an index, even if they are small documents, is
probably going to result in an index size of at least 20GB, and quite
possibly larger.  In order to make Solr function with that many
documents, I would guess that you have a heap that's at least 4GB in size.
Currently I have 2,5GB heap, on the 8GB machine - to leave something for 
the OS cache


With only 8GB on the machine, this doesn't leave much RAM for the OS
disk cache.  If we assume that you have 4GB left for caching, then I
would expect to see problems about the time your per-machine indexes hit
15GB in size.  If you are making it beyond that with a total of 300
million documents, then I am impressed.

Two things are going to happen when you have enough documents:  1) You
are going to fill up your Java heap and Java will need to do frequent
collections to free up enough RAM for normal operation.  When this
problem gets bad enough, the frequent collections will be *full* GCs,
which are REALLY slow.
What is it that will fill my heap? I am trying to avoid the FieldCache. 
For now, I am actually not doing any searches - focus on indexing for 
now - and certainly not group/facet/sort searches that will use the 
FieldCache.

   2) The index will be so big that the OS disk
cache cannot effectively cache it.  I suspect that the latter is more of
the problem, but both might be happening at nearly the same time.




When dealing with an index of this size, you want as much RAM as you can
possibly afford.  I don't think I would try what you are doing without
at least 64GB per machine, and I would probably use at least an 8GB heap
on each one, quite possibly larger.  With a heap that large, extreme GC
tuning becomes a necessity.
More RAM will probably help, but only for a while. I want billions of 
documents in my collections - and also on each machine. Currently we are 
aiming 15 billion documents per month (500 million per day) and keep at 
least two years of data in the system. Currently we use one collection 
for each month, so when the system has been running for two years it 
will be 24 collections with 15 billion documents each. Indexing will 
only go on in the collection corresponding to the "current" month, but 
searching will (potentially) be across all 24 collections. The documents 
are very small. I know that 6 machines will not do in the long run - 
currently this is only testing - but number of machines should not be 
higher than about 20-40. In general it is a problem if Solr/Lucene will 
not perform fairly well if data does not fit RAM - then it cannot really 
be used for "big data". I would have to buy hundreds or even thousands 
of machines with 64GB+ RAM. That is not realistic.


To cut down on the amount of merging, I go with a fairly large
mergeFactor, but mergeFactor is basically deprecated for
TieredMergePolicy, there's a new way to configure it now.  Here's the
indexConfig settings that I use on my dev server:


   
 35
 35
 105
   
   
 1
 6
   
   48
   false


Thanks,
Shawn



Thanks!


Re: Stop filter changes in Solr >= 4.4

2013-09-13 Thread Shalin Shekhar Mangar
Can we see a full stack trace for that IllegalArgumentException?
AFAIk, enablePositionIncrements=false is deprecated in 4.x but not
removed. It will be removed in 5.0 though.

On Fri, Sep 13, 2013 at 3:34 AM, Christopher Condit  wrote:
> While attempting to upgrade from Solr 4.3.0 to Solr 4.4.0 I ran into
> this exception:
>
>  java.lang.IllegalArgumentException: enablePositionIncrements=false is
> not supported anymore as of Lucene 4.4 as it can create broken token
> streams
>
> which led me to https://issues.apache.org/jira/browse/LUCENE-4963.  I
> need to be able to match queries irrespective of intervening stopwords
> (which used to work with enablePositionIncrements="true"). For
> instance: "foo of the bar" would find documents matching "foo bar",
> "foo of bar", and "foo of the bar". With this option deprecated in
> 4.4.0 I'm not clear on how to maintain the same functionality.
>
> The package javadoc adds:
>
> If the selected analyzer filters the stop words "is" and "the", then
> for a document containing the string "blue is the sky", only the
> tokens "blue", "sky" are indexed, with position("sky") = 3 +
> position("blue"). Now, a phrase query "blue is the sky" would find
> that document, because the same analyzer filters the same stop words
> from that query. But the phrase query "blue sky" would not find that
> document because the position increment between "blue" and "sky" is
> only 1.
>
> If this behavior does not fit the application needs, the query parser
> needs to be configured to not take position increments into account
> when generating phrase queries.
>
> But there's no mention of how to actually configure the query parser
> to do this. Does anyone know how to deal with this issue as Solr moves
> toward 5.0?
>
> Crossposted from stackoverflow:
> http://stackoverflow.com/questions/18668376/solr-4-4-stopfilterfactory-and-enablepositionincrements



-- 
Regards,
Shalin Shekhar Mangar.


Re: Different Responses for 4.4 and 3.5 solr index

2013-09-13 Thread Jack Krupansky
There may be some token filters that are emitting a different number of 
terms. There are so many changes between 3.5 and 4.4, that it simply isn't 
worth the trouble to track down all of them. In some cases, there may be 
bugs in 3.5 that have gotten fixed in any of the intervening releases.


Do you have a specific example - the input text and the field and field type 
and analyzer where the tf differs? That should suggest where the differences 
come from.


Do you have any specific reason to believe that one of the counts is more 
right than the other?


-- Jack Krupansky

-Original Message- 
From: Kuchekar

Sent: Thursday, September 12, 2013 4:50 PM
To: solr-user@lucene.apache.org
Cc: Stefan Matheis
Subject: Re: Different Responses for 4.4 and 3.5 solr index

Hi,

After triaging more for this, we find that the termFrequency (tf) for
the same field in the same doc in solr 3.5 and 4.4 is different.

example :

If word "fruits" appear in some field for 20 times

In 3.5 tf is reported to be 8, where as in 4.4 solr it reports to be 20.
that is changing the the score.

Also we see that the function 'idf' which depends upon the max doc is
changed.

Are there any changes in 'termFrequency' and 'idf' function in solr 4.4
compared to solr 3.5.

Looking forward for your reply.

Thanks.
Kuchekar, Nilesh


On Thu, Sep 12, 2013 at 11:30 AM, Kuchekar wrote:


Hi,

Any updates on this?. Is ranking computation dependent on the 'maxDoc'
value in the solr? Is this happening due to changing value of 'maxDoc'
value after each optimization. As in, in solr 4.4 every time optimization
is ran, the 'maxDoc' value is reset, where as this is not the case in solr
3.5.

Looking forward for the reply.

Thanks.
Kuchekar, Nilesh


On Wed, Aug 28, 2013 at 3:32 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:


We've been seeing changes in our rankings as well.  I don't have a
definite answer yet, since we're waiting on an index rebuild, but our
current working theory is that the change to default omitNorms="true" for
primitive types may have had an effect, possibly due to follow on
confusion: our developers may have omitted norms from some other fields
they shouldn't have?

-Mike


On 08/26/2013 09:46 AM, Stefan Matheis wrote:


Did you check the scoring? (use fl=*,score to retrieve it) ..
additionally debugQuery=true might provide more information about how 
the

score was calculated.

- Stefan


On Monday, August 26, 2013 at 12:46 AM, Kuchekar wrote:

 Hi,

The response from 4.4 and 3.5 in the current scenario differs in the
sequence in which results are given us back.

For example :

Response from 3.5 solr is : id:A, id:B, id:C, id:D ...
Response from 4.4 solr is : id C, id:A, id:D, id:B...

Looking forward your reply.

Thanks.
Kuchekar, Nilesh


On Sun, Aug 25, 2013 at 11:32 AM, Stefan Matheis
(mailto:matheis.stefan@gmail.**com

)>wrote:

 Kuchekar (hope that's your first name?)


you didn't tell us .. how they differ? do you get an actual error? or
does
the result contain documents you didn't expect? or the other way 
round,

that some are missing you'd expect to be there?

- Stefan


On Sunday, August 25, 2013 at 4:43 PM, Kuchekar wrote:

 Hi,


We get different response when we query 4.4 and 3.5 solr using same
query params.

My query param are as following :

facet=true
&facet.mincount=1
&facet.limit=25

 &qf=content^0.0+p_last_name^**500.0+p_first_name^50.0+**

strong_topic^0.0+first_author_**topic^0.0+last_author_topic^0.**
0+title_topic^0.0


&wt=javabin
&version=2
&rows=10
&f.affiliation_org.facet.**limit=150
&fl=p_id,p_first_name,p_last_**name
&start=0
&q=Apple
&facet.field=affiliation_org
&fq=table:profile
&fq=num_content:[*+TO+1500]
&fq=name:"Apple"

The content in both (solr 4.4 and solr 3.5) are same.

The solrconfig.xml from 3.5 an 4.4 are similarly constructed.

Is there something I am missing that might have been changed in 4.4,


which


might be causing this issue. ?. The "qf" params looks same.

Looking forward for your reply.

Thanks.
Kuchekar, Nilesh


















Re: Different Responses for 4.4 and 3.5 solr index

2013-09-13 Thread Kuchekar
Hi,

 After triaging more for this, we find that the termFrequency (tf) for
the same field in the same doc in solr 3.5 and 4.4 is different.

example :

If word "fruits" appear in some field for 20 times

In 3.5 tf is reported to be 8, where as in 4.4 solr it reports to be 20.
that is changing the the score.

Also we see that the function 'idf' which depends upon the max doc is
changed.

Are there any changes in 'termFrequency' and 'idf' function in solr 4.4
compared to solr 3.5.

Looking forward for your reply.

Thanks.
Kuchekar, Nilesh


On Thu, Sep 12, 2013 at 11:30 AM, Kuchekar wrote:

> Hi,
>
> Any updates on this?. Is ranking computation dependent on the 'maxDoc'
> value in the solr? Is this happening due to changing value of 'maxDoc'
> value after each optimization. As in, in solr 4.4 every time optimization
> is ran, the 'maxDoc' value is reset, where as this is not the case in solr
> 3.5.
>
> Looking forward for the reply.
>
> Thanks.
> Kuchekar, Nilesh
>
>
> On Wed, Aug 28, 2013 at 3:32 PM, Michael Sokolov <
> msoko...@safaribooksonline.com> wrote:
>
>> We've been seeing changes in our rankings as well.  I don't have a
>> definite answer yet, since we're waiting on an index rebuild, but our
>> current working theory is that the change to default omitNorms="true" for
>> primitive types may have had an effect, possibly due to follow on
>> confusion: our developers may have omitted norms from some other fields
>> they shouldn't have?
>>
>> -Mike
>>
>>
>> On 08/26/2013 09:46 AM, Stefan Matheis wrote:
>>
>>> Did you check the scoring? (use fl=*,score to retrieve it) ..
>>> additionally debugQuery=true might provide more information about how the
>>> score was calculated.
>>>
>>> - Stefan
>>>
>>>
>>> On Monday, August 26, 2013 at 12:46 AM, Kuchekar wrote:
>>>
>>>  Hi,
 The response from 4.4 and 3.5 in the current scenario differs in the
 sequence in which results are given us back.

 For example :

 Response from 3.5 solr is : id:A, id:B, id:C, id:D ...
 Response from 4.4 solr is : id C, id:A, id:D, id:B...

 Looking forward your reply.

 Thanks.
 Kuchekar, Nilesh


 On Sun, Aug 25, 2013 at 11:32 AM, Stefan Matheis
 >>> (mailto:matheis.stefan@gmail.**com
 )>wrote:

  Kuchekar (hope that's your first name?)
>
> you didn't tell us .. how they differ? do you get an actual error? or
> does
> the result contain documents you didn't expect? or the other way round,
> that some are missing you'd expect to be there?
>
> - Stefan
>
>
> On Sunday, August 25, 2013 at 4:43 PM, Kuchekar wrote:
>
>  Hi,
>>
>> We get different response when we query 4.4 and 3.5 solr using same
>> query params.
>>
>> My query param are as following :
>>
>> facet=true
>> &facet.mincount=1
>> &facet.limit=25
>>
>>  &qf=content^0.0+p_last_name^**500.0+p_first_name^50.0+**
> strong_topic^0.0+first_author_**topic^0.0+last_author_topic^0.**
> 0+title_topic^0.0
>
>> &wt=javabin
>> &version=2
>> &rows=10
>> &f.affiliation_org.facet.**limit=150
>> &fl=p_id,p_first_name,p_last_**name
>> &start=0
>> &q=Apple
>> &facet.field=affiliation_org
>> &fq=table:profile
>> &fq=num_content:[*+TO+1500]
>> &fq=name:"Apple"
>>
>> The content in both (solr 4.4 and solr 3.5) are same.
>>
>> The solrconfig.xml from 3.5 an 4.4 are similarly constructed.
>>
>> Is there something I am missing that might have been changed in 4.4,
>>
> which
>
>> might be causing this issue. ?. The "qf" params looks same.
>>
>> Looking forward for your reply.
>>
>> Thanks.
>> Kuchekar, Nilesh
>>
>>
>


>>>
>>>
>>
>


Re: exceeded limit of maxWarmingSearchers

2013-09-13 Thread gfbj
I ended up having to do a mathematical increase of the delay



because the indexing eventually would outstrip the static value I set and
crash the maxWarmingSearchers.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-tp489803p4089699.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Get the commit time of a document in Solr

2013-09-13 Thread phanichaitanya
Thanks Jack, Shawn and Raymond.

Shawn - I've to do it with every commit. So I guess apparently there is no
way apart from writing custom plugins to Solr.

I'll look into the pointers you suggested.

Regards,
Phani.



-
Phani Chaitanya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089722.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with delta import

2013-09-13 Thread umajava
Sorry but I gave up on this issue. I could not resolve it.


On Tue, Sep 10, 2013 at 8:24 PM, suren [via Lucene] <
ml-node+s472066n4089093...@n3.nabble.com> wrote:

> Any update? I am also having the same issue. pls reply.
>
> This XML file does not appear to have any style information associated
> with it. The document tree is shown below.
> 
> 
> 0
> 7
> 
> 
> 
> db-data-config.xml
> 
> 
> delta-import
> idle
> 
> 
> 2
> 1
> 0
> 2013-09-10 07:46:34
> 2013-09-10 07:46:34
> 2013-09-10 07:46:35
> 2013-09-10 07:46:35
> 1
> 0
> 0:0:1.30
> 
> 
> This response format is experimental. It is likely to change in the
> future.
> 
> 
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Need-help-with-delta-import-tp4025003p4089093.html
>  To unsubscribe from Need help with delta import, click 
> here
> .
> NAML
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-help-with-delta-import-tp4025003p4089714.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get the commit time of a document in Solr

2013-09-13 Thread phanichaitanya
Apologies again. But here is another try :

I want to make sure that documents that are indexed are committed in say an
hour. I agree that if you pass commitWithIn params and the like will make
sure of that based on the time configurations we set. But, I want to make
sure that the document is really committed within whatever time we set using
commitWithIn.

It's a question asking for proof that Solr commits within that time if we
add commitWithIn parameter to the configuration.

That is about commitWithIn parameter option that you suggested.

Now is there a way to explicitly get all the documents that are committed
when a hard commit request is issued ? This might not make sense but we are
pondered with that question.



-
Phani Chaitanya
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-the-commit-time-of-a-document-in-Solr-tp4089624p4089687.html
Sent from the Solr - User mailing list archive at Nabble.com.