Custom Hit Collector

2012-07-15 Thread Mike Schultz
As far as I can tell, using field collapsing prevents the use of the
queryResultCache from being checked. It's important for our application to
have both.  There are threads on incorporating custom hit collectors which
seems like it could be a way to implement the simplified collapsing I need
(just deduping based on the fieldCache value) but still consult the
queryResultCache.

Does anyone know the state being able to incorporate a custom hit collector,
say, in 4.0.  Or probably better, how to get caching to work with field
collapsing?

Mike

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Hit-Collector-tp3995073.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query results vs. facets results

2012-07-15 Thread tudor
Hello,

I am new to Solr and I running some tests with our data in Solr. I am using
version 3.6 and the data is imported form a DB2 database using Solr's DIH.
We have defined a single entity in the db-data-config.xml, which is an
equivalent of the following query:
entity name=connections
query=
SELECT C.ID,
   C.ID2,
   F.CITY
   FROM
   NAME_CONNECTIONS AS C
   JOIN NAME_DETAILS AS F
   ON C.ID2 = F.ID

/entity

The ID in NAME_CONNECTIONS is not unique, so it might appear multiple times.

For the unique ID in the schema, we are using a solr.UUIDField:

fieldType name=uuid class=solr.UUIDField indexed=true /
field name=quot;idquot; type=quot;uuidquot; indexed=quot;truequot;
stored=quot;truequot; default=quot;NEWquot;/

All the searchable fields are declared as indexed and stored.

I am aware of the fact that this is a very crude configuration, but for the
tests that I am running it should be enough.

The problem that I have is the different result counts that I receive when I
do equivalent queries for searching and faceting. For example, running the
following query

http://localhost:8983/solr/db/select?indent=onamp;version=2.2amp;q=CITY:MILTONamp;fq=amp;start=0amp;rows=10amp;fl=*amp;wt=amp;explainOther=amp;hl.fl=amp;group=trueamp;group.field=IDamp;group.ngroups=trueamp;group.truncate=true

yields 

lt;int name=quot;ngroupsquot;134/int

as a result, which is exactly what we expect.

On the other hand, running

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYgroup.ngroups=true

yields

lst name=facet_counts
   lst name=facet_queries/
 lst name=facet_fields
  lst name=CITY
int name=MILTON103/int

I would expect to have the same number (134) in this facet result as the
previous filter result. Could you please let me know why these two results
are different?

Thank you,
Tudor 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Yonik Seeley
Do you have the following hard autoCommit in your config (as the stock
server does)?

 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

This is now fairly important since Solr now tracks information on
every uncommitted document added.
At some point we should probably hardcode some mechanism based on
number of documents or time.

-Yonik
http://lucidimagination.com


Re: Groups count in distributed grouping is wrong in some case

2012-07-15 Thread Agnieszka Kukałowicz
Hi,

I'm using SOLR 4.x from trunk. This was the version from 2012-07-10. So
this is one of the latest versions.

I searched mailing list and jira but found only this
https://issues.apache.org/jira/browse/SOLR-3436

It was committed in May to trunk so my version of SOLR has this fix. But
the problem still exists.

Cheers
Agnieszka

2012/7/15 Erick Erickson erickerick...@gmail.com

 what version of Solr are you using? There's been quite a bit of work
 on this lately,
 I'm not even sure how much has made it into 3.6. You might try searching
 the
 JIRA list, Martijn van Groningen has done a bunch of work lately, look for
 his name. Fortunately, it's not likely to get a bunch of false hits G..

 Best
 Erick

 On Fri, Jul 13, 2012 at 7:50 AM, Agnieszka Kukałowicz
 agnieszka.kukalow...@usable.pl wrote:
  Hi,
 
  I have problem with faceting count in distributed grouping. It appears
 only
  when I make query that returns almost all of the documents.
 
  My SOLR implementation has 4 shards and my queries looks like:
 
  http://host:port
 
 /select/q?=*:*shards=shard1,shard2,shard3,shard4group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
 
  With query like above I get strange counts for field category1.
  The counts for values are very big:
  int name=val19659/int
  int name=val27015/int
  int name=val35676/int
  int name=val41180/int
  int name=val51105/int
  int name=val6979/int
  int name=val7770/int
  int name=val8701/int
  int name=612/int
  int name=val9422/int
  int name=val10358/int
 
  When I make query to narrow the results adding to query
  fq=category1:val1, etc. I get different counts than facet category1
 shows
  for a few first values:
 
  fq=category1:val1 - counts: 22
  fq=category1:val2 - counts: 22
  fq=category1:val3 - counts: 21
  fq=category1:val4 - counts: 19
  fq=category1:val5 - counts: 19
  fq=category1:val6 - counts: 20
  fq=category1:val7 - counts: 20
  fq=category1:val8 - counts: 25
  fq=category1:val9 - counts: 422
  fq=category1:val10 - counts: 358
 
  From val9 the count is ok.
 
  First I thought that for some values in facet category1 groups count
 does
  not work and it returns counts of all documents not group by field id.
  But the number of all documents matches query  fq=category1:val1 is
  45468. So the numbers are not the same.
 
  I check the queries on each shard for val1 and the results are:
 
  shard1:
  query:
 
 http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
 
  lst name=fcategory
  int name=val111/int
 
  query:
 
 http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
  :val1
 
  shard 2:
  query:
 
 http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
 
  there is no value val1 in category1 facet.
 
  query:
 
 http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
  :val1
 
  int name=ngroups7/int
 
  shard3:
  query:
 
 http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
 
  there is no value val1 in category1 facet
 
  query:
 
 http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
  :val1
 
  int name=ngroups4/int
 
  So it looks that detail query with fq=category1:val1 returns the
 relevant
  results. But Solr has problem with faceting counts when one of the shard
  does not return the faceting value (in this scenario val1) that exists
 on
  other shards.
 
  I checked shards for val10 and I got:
 
  shard1: count for val10 - 142
  shard2: count for val10 - 131
  shard3: count for val10 -  149
  sum of counts 422 - ok.
 
  I'm not sure how to resolve that situation. For sure the counts of val1
 to
  val9 should be different and they should not be on the top of the
 category1
  facet because this is very confusing. Do you have any idea how to fix
 this
  problem?
 
  Best regards
  Agnieszka



Lost answers?

2012-07-15 Thread Bruno Mannina

Dear Solr Users,

I have a solr3.6 + Tomcat and I have a program that connect 4 http 
requests at the same time.

I must do 1902 requests.

I do several tests but each time it losts some requests:
- sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs.

With Jetty, I get always 1902 docs.

As it's a dev' environment, I'm alone to test it.

Is it a problem to do 4 requests at the same time for tomcat6?

thanks for your info,

Bruno


Re: Facet on all the dynamic fields with *_s feature

2012-07-15 Thread Jack Krupansky
The answer appears to be No, but it's good to hear people express an 
interest in proposed features.


-- Jack Krupansky

-Original Message- 
From: Rajani Maski

Sent: Sunday, July 15, 2012 12:02 AM
To: solr-user@lucene.apache.org
Subject: Facet on all the dynamic fields with *_s feature

Hi All,

  Is this issue fixed in solr 3.6 or 4.0:  Faceting on all Dynamic field
with facet.field=*_s

  Link  :  https://issues.apache.org/jira/browse/SOLR-247



 If it is not fixed, any suggestion on how do I achieve this?


My requirement is just same as this one :
http://lucene.472066.n3.nabble.com/Dynamic-facet-field-tc2979407.html#none


Regards
Rajani 



Solr - Spatial Search for Specif Areas on Map

2012-07-15 Thread samabhiK
Hi,

I am new to Solr Spatial Search and would like to understand if Solr can be
used successfully for very large data sets in the range of 4Billion records.
I need to search some filtered data based on a region - maybe a set of
lat/lons or polygon area. is that possible in solr? How fast is it with such
data size? Will it be able to handle the load for 1 req/sec? If so, how?
Do you think solr can beat the performance of PostGIS? As I am about to
choose the right technology for my new project, I need some expert comments
from the community.

Regards
Sam

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Nick Koton
 Do you have the following hard autoCommit in your config (as the stock
server does)?
 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

I have tried with and without that setting.  When I described running with
auto commit, that setting is what I mean.  I have varied the time in the
range 10,000-60,000 msec.  I have tried this setting with and without soft
commit in the server config file.

I have tried without this setting, but specifying the commit within time in
the solrj client in the add method.

In both these cases, the client seems to overrun the server and out of
memory in the server results.  One clarification I should make is that after
the server gets out of memory, the solrj client does NOT receive an error.
However, the documents indexed do not reliably appear to queries.

Approach #3 is to remove the autocommit in the server config, issue the add
method without commit within, but issue commits in the solrj client with
wait for sync and searcher set to true.  In case #3, I do not see the out of
memory in the server.  However, document index rates are restricted to about
1,000 per second.

 Nick

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, July 15, 2012 5:15 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4 Alpha Out Of Mem Err

Do you have the following hard autoCommit in your config (as the stock
server does)?

 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

This is now fairly important since Solr now tracks information on every
uncommitted document added.
At some point we should probably hardcode some mechanism based on number of
documents or time.

-Yonik
http://lucidimagination.com



Re: Query results vs. facets results

2012-07-15 Thread Erick Erickson
q and fq queries don't necessarily run through the same query parser, see:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting

So try adding debugQuery=on to both queries you submitted. My guess
is that if you look at the parsed queries, you'll see something that explains
your differences. If not, paste the results back and we can take a look.

BTW, ignore all the explain bits for now, the important bit is the parsed form
of q and fq in your queries.

Best
Erick

On Sat, Jul 14, 2012 at 5:11 AM, tudor tudor.zaha...@gmail.com wrote:
 Hello,

 I am new to Solr and I running some tests with our data in Solr. We are
 using version 3.6 and the data is imported form a DB2 database using Solr's
 DIH. We have defined a single entity in the db-data-config.xml, which is an
 equivalent of the following query:
 entity name=connections
 query=
 SELECT C.NAME,
F.CITY
 FROM
 NAME_CONNECTIONS AS C
 JOIN NAME_DETAILS AS F
 ON C.DETAILS_NAME = F.NAME

 /entity

 This might lead to some names appearing multiple times in the result set.
 This is OK.

 For the unique ID in the schema, we are using a solr.UUIDField:

 fieldType name=uuid class=solr.UUIDField indexed=true /
 field name=quot;idquot; type=quot;uuidquot; indexed=quot;truequot;
 stored=quot;truequot; default=quot;NEWquot;/

 All the searchable fields are declared as indexed and stored.

 I am aware of the fact that this is a very crude configuration, but for the
 tests that I am running it is fine.

 The problem that I have is the different result counts that I receive when I
 do equivalent queries for searching and faceting. For example, running the
 following query

 http://localhost:8983/solr/db/select?indent=onamp;version=2.2amp;q=CITY:MILTONamp;fq=amp;start=0amp;rows=100amp;fl=*amp;wt=amp;explainOther=amp;hl.fl=amp;group=trueamp;group.field=NAMEamp;group.ngroups=trueamp;group.truncate=true

 yields

 lt;int name=quot;ngroupsquot;134/int

 as a result, which is exactly what we expect.

 On the other hand, running

 http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=NAMEgroup.truncate=truefacet=truefacet.field=CITYgroup.ngroups=true

 yields

 lst name=facet_counts
lst name=facet_queries/
  lst name=facet_fields
   lst name=CITY
 int name=MILTON103/int

 I would expect to have the same number (134) in this facet result as well.
 Could you please let me know why these two results are different?

 Thank you,
 Tudor



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3994988.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

2012-07-15 Thread Erick Erickson
Anything currently in the trunk will most probably be in the BETA and
in the eventual release. So I'd go with the trunk code. It'll always
be closer to the actual release than ALPHA or BETA

I know there've been some changes recently around, exactly
the collection name. In fact there's a discussion about
rearranging the whole example directory

Best
Erick

On Sat, Jul 14, 2012 at 9:54 PM, Roman Chyla roman.ch...@gmail.com wrote:
 Hi,

 Is it intentional that the ALPHA release has a different folder structure
 as opposed to the trunk?

 eg. collection1 folder is missing in the ALPHA, but present in branch_4x
 and trunk

 lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl
 4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl
 lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl


 This has consequences for development - e.g. solr testcases do not expect
 that the collection1 is there for ALPHA.

 In general, what is your advice for developers who are upgrading from solr
 3.x to solr 4.x? What codebase should we follow to minimize the pain of
 porting to the next BETA and stable releases?

 Thanks!

   roman


Re: Metadata and FullText, indexed at different times - looking for best approach

2012-07-15 Thread Erick Erickson
You've got a couple of choices. There's a new patch in town
https://issues.apache.org/jira/browse/SOLR-139
that allows you to update individual fields in a doc if (and only if)
all the fields in the original document were stored (actually, all the
non-copy fields).

So if you're storing (stored=true) all your metadata information, you can
just update the document when the  text becomes available assuming you
know the uniqueKey when you update.

Under the covers, this will find the old document, get all the fields, add the
new fields to it, and re-index the whole thing.

Otherwise, your fallback idea is a good one.

Best
Erick

On Sat, Jul 14, 2012 at 11:05 PM, Alexandre Rafalovitch
arafa...@gmail.com wrote:
 Hello,

 I have a database of metadata and I can inject it into SOLR with DIH
 just fine. But then, I also have the documents to extract full text
 from that I want to add to the same records as additional fields. I
 think DIH allows to run Tika at the ingestion time, but I may not have
 the full-text files at that point (they could arrive days later). I
 can match the file to the metadata by a file name matching a field
 name.

 What is the best approach to do that staggered indexing with minimum
 custom code? I guess my fallback position is a custom full-text
 indexer agent that re-adds the metadata fields when the file is being
 indexed. Is there anything better?

 I am a newbie using v4.0alpha of SOLR (and loving it).

 Thank you,
 Alex.
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

2012-07-15 Thread Jack Krupansky

Anything currently in the trunk ...

I think you mean Anything in the 4x branch, since trunk is 5x by 
definition.


But I'd agree that taking a nightly build or building from the 4x branch is 
likely to be a better bet than the old Alpha.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Sunday, July 15, 2012 11:02 AM
To: solr-user@lucene.apache.org
Subject: Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

Anything currently in the trunk will most probably be in the BETA and
in the eventual release. So I'd go with the trunk code. It'll always
be closer to the actual release than ALPHA or BETA

I know there've been some changes recently around, exactly
the collection name. In fact there's a discussion about
rearranging the whole example directory

Best
Erick

On Sat, Jul 14, 2012 at 9:54 PM, Roman Chyla roman.ch...@gmail.com wrote:

Hi,

Is it intentional that the ALPHA release has a different folder structure
as opposed to the trunk?

eg. collection1 folder is missing in the ALPHA, but present in branch_4x
and trunk

lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl
4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl
lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl


This has consequences for development - e.g. solr testcases do not expect
that the collection1 is there for ALPHA.

In general, what is your advice for developers who are upgrading from solr
3.x to solr 4.x? What codebase should we follow to minimize the pain of
porting to the next BETA and stable releases?

Thanks!

  roman 




Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Jack Krupansky
Maybe your rate of update is so high that the commit never gets a chance to 
run. So, maybe all these uncommitted updates are buffered up and using 
excess memory.


Try explicit commits from SolrJ, but less frequently. Or maybe if you just 
pause your updates periodically (every 30 seconds or so) the auto-commit 
would get a chance to occur. Although I have no idea how long a pause might 
be needed.


-- Jack Krupansky

-Original Message- 
From: Nick Koton

Sent: Sunday, July 15, 2012 10:52 AM
To: solr-user@lucene.apache.org ; yo...@lucidimagination.com
Subject: RE: SOLR 4 Alpha Out Of Mem Err


Do you have the following hard autoCommit in your config (as the stock

server does)?

autoCommit
  maxTime15000/maxTime
  openSearcherfalse/openSearcher
/autoCommit


I have tried with and without that setting.  When I described running with
auto commit, that setting is what I mean.  I have varied the time in the
range 10,000-60,000 msec.  I have tried this setting with and without soft
commit in the server config file.

I have tried without this setting, but specifying the commit within time in
the solrj client in the add method.

In both these cases, the client seems to overrun the server and out of
memory in the server results.  One clarification I should make is that after
the server gets out of memory, the solrj client does NOT receive an error.
However, the documents indexed do not reliably appear to queries.

Approach #3 is to remove the autocommit in the server config, issue the add
method without commit within, but issue commits in the solrj client with
wait for sync and searcher set to true.  In case #3, I do not see the out of
memory in the server.  However, document index rates are restricted to about
1,000 per second.

Nick

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, July 15, 2012 5:15 AM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4 Alpha Out Of Mem Err

Do you have the following hard autoCommit in your config (as the stock
server does)?

autoCommit
  maxTime15000/maxTime
  openSearcherfalse/openSearcher
/autoCommit

This is now fairly important since Solr now tracks information on every
uncommitted document added.
At some point we should probably hardcode some mechanism based on number of
documents or time.

-Yonik
http://lucidimagination.com 



Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Yonik Seeley
On Sun, Jul 15, 2012 at 11:52 AM, Nick Koton nick.ko...@gmail.com wrote:
 Do you have the following hard autoCommit in your config (as the stock
 server does)?
 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

 I have tried with and without that setting.  When I described running with
 auto commit, that setting is what I mean.

OK cool.  You should be able to run the stock server (i.e. with this
autocommit) and blast in updates all day long - it looks like you have
more than enough memory.  If you can't, we need to fix something.  You
shouldn't need explicit commits unless you want the docs to be
searchable at that point.

 Solrj multi-threaded client sends several 1,000 docs/sec

Can you expand on that?  How many threads at once are sending docs to
solr?  Is each request a single doc or multiple?

-Yonik
http://lucidimagination.com


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Yonik Seeley
On Sun, Jul 15, 2012 at 12:52 PM, Jack Krupansky
j...@basetechnology.com wrote:
 Maybe your rate of update is so high that the commit never gets a chance to
 run.

I don't believe that is possible.  If it is, it should be fixed.

-Yonik
http://lucidimagination.com


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Jack Krupansky

Agreed. That's why I say maybe. Clearly something sounds amiss here.

-- Jack Krupansky

-Original Message- 
From: Yonik Seeley

Sent: Sunday, July 15, 2012 12:06 PM
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4 Alpha Out Of Mem Err

On Sun, Jul 15, 2012 at 12:52 PM, Jack Krupansky
j...@basetechnology.com wrote:
Maybe your rate of update is so high that the commit never gets a chance 
to

run.


I don't believe that is possible.  If it is, it should be fixed.

-Yonik
http://lucidimagination.com 



Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

2012-07-15 Thread Mark Miller
The beta will have files that where in solr/conf and solr/data in 
solr/collection1/conf|data instead.

What Solr test cases are you referring to? The only ones that should care about 
this would have to be looking at the file system. If that is the case, simply 
update the path. The built in tests had to be adjusted for this as well.

The problem with having the default core use /solr as a conf dir is that if you 
create another core, where does it logically go? The default collection is 
called collection1, so now its conf and data lives in a folder called 
collection1. A new SolrCore called newsarticles would have it's conf and data 
in /solr/newsarticles.

There are still going to be some bumps as you move from alpha to beta to 
release if you are depending on very specific file system locations - however, 
they should be small bumps that are easily handled.

Just send an email to the user list if you'd like some help with anything in 
particular.

In this case, I'd update what you have to look at /solr/collection1 rather than 
simply /solr. It's still the default core, so simple URLs without the core name 
will still work. It won't affect HTTP communication. Just file system location.

On Jul 14, 2012, at 9:54 PM, Roman Chyla wrote:

 Hi,
 
 Is it intentional that the ALPHA release has a different folder structure
 as opposed to the trunk?
 
 eg. collection1 folder is missing in the ALPHA, but present in branch_4x
 and trunk
 
 lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl
 4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl
 lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl
 
 
 This has consequences for development - e.g. solr testcases do not expect
 that the collection1 is there for ALPHA.
 
 In general, what is your advice for developers who are upgrading from solr
 3.x to solr 4.x? What codebase should we follow to minimize the pain of
 porting to the next BETA and stable releases?
 
 Thanks!
 
  roman

- Mark Miller
lucidimagination.com













Re: Index version on slave incrementing to higher than master

2012-07-15 Thread Andrew Davidoff
Erick,

Thank you. I think originally my thought was that if I had my slave
configuration really close to my master config, it would be very easy to
promote a slave to a master (and vice versa) if necessary. But I think you
are correct that ripping out from the slave config anything that would
modify an index in any way makes sense. I will give this a try very soon.

Thanks again.
Andy


On Sat, Jul 14, 2012 at 5:22 PM, Erick Erickson erickerick...@gmail.comwrote:

 Gotta admit it's a bit puzzling, and surely you want to move to the 3x
 versions G..

 But at a guess, things might be getting confused on the slaves given
 you have a merge policy on them. There's no reason to have any
 policies on the slaves; slaves should just be about copying the files
 from the master, all the policies,commits,optimizes should be done on
 the master. About all the slave does is copy the current state of the index
 from the master.

 So I'd try removing everything but the replication from the slaves,
 including
 any autocommit stuff and just let replication do it's thing.

 And I'd replicate after the optimize if you keep the optimize going. You
 should
 end up with one segment in the index after that, on both the master and
 slave.
 You can't get any more merged than that.

 Of course you'll also copy the _entire_ index every time after you've
 optimized...

 Best
 Erick

 On Fri, Jul 13, 2012 at 12:31 AM, Andrew Davidoff david...@qedmf.net
 wrote:
  Hi,
 
  I am running solr 1.4.0+ds1-1ubuntu1. I have a master server that has a
  number of solr instances running on it (150 or so), and nightly most of
  them have documents written to them. The script that does these writes
  (adds) does a commit and an optimize on the indexes when it's entirely
  finished updating them, then initiates replication on the slave per
  instance. In this configuration, the index versions between master and
  slave remain in synch.
 
  The optimize portion, which, again, happens nightly, is taking a lot of
  time and I think it's unnecessary. I was hoping to stop doing this
 explicit
  optimize, and to let my merge policy handle that. However, if I don't do
 an
  optimize, and only do a commit before initiating slave replication, some
  hours later the slave is, for reasons that are unclear to me,
 incrementing
  its index version to 1 higher than the master.
 
  I am not really sure I understand the logs, but it looks like the
  incremented index version is the result of an optimize on the slave, but
 I
  am never issuing any commands against the slave aside from initiating
  replication, and I don't think there's anything in my solr configuration
  that would be initiating this. I do have autoCommit on with maxDocs of
  1000, but since I am initiating slave replication after doing a commit on
  the master, I don't think there would ever be any uncommitted documents
 on
  the slave. I do have a merge policy configured, but it's not clear to me
  that it has anything to do with this. And if it did, I'd expect to see
  similar behavior on the master (right?).
 
  I have included a snipped from my slave logs that shows this issue. In
 this
  snipped index version 1286065171264 is what the master has,
  and 1286065171265 is what the slave increments itself to, which is then
 out
  of synch with the master in terms of version numbers. Nothing that I know
  of is issuing any commands to the slave at this time. If I understand
 these
  logs (I might not), it looks like something issued an optimize that took
  1023720ms? Any ideas?
 
  Thanks in advance.
 
  Andy
 
 
 
  Jul 12, 2012 12:21:14 PM org.apache.solr.update.SolrIndexWriter close
  FINE: Closing Writer DirectUpdateHandler2
  Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy onCommit
  INFO: SolrDeletionPolicy.onCommit: commits:num=2
 
 
 commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h8,version=1286065171264,generation=620,filenames=[_h6.fnm,
  _h5.nrm, segments_h8, _h4.nrm, _h5.tii, _h4
  .tii, _h5.tis, _h4.tis, _h4.fdx, _h5.fnm, _h6.tii, _h4.fdt, _h5.fdt,
  _h5.fdx, _h5.frq, _h4.fnm, _h6.frq, _h6.tis, _h4.prx, _h4.frq, _h6.nrm,
  _h5.prx, _h6.prx, _h6.fdt, _h6
  .fdx]
 
 
 commit{dir=/var/lib/ontolo/solr/o_3952/index,segFN=segments_h9,version=1286065171265,generation=621,filenames=[_h7.tis,
  _h7.fdx, _h7.fnm, _h7.fdt, _h7.prx, segment
  s_h9, _h7.nrm, _h7.tii, _h7.frq]
  Jul 12, 2012 12:21:14 PM org.apache.solr.core.SolrDeletionPolicy
  updateCommits
  INFO: newest commit = 1286065171265
  Jul 12, 2012 12:21:14 PM org.apache.solr.search.SolrIndexSearcher init
  INFO: Opening Searcher@4ac62082 main
  Jul 12, 2012 12:21:14 PM org.apache.solr.update.DirectUpdateHandler2
 commit
  INFO: end_commit_flush
  Jul 12, 2012 12:21:14 PM org.apache.solr.search.SolrIndexSearcher warm
  INFO: autowarming Searcher@4ac62082 main from Searcher@48d901f7 main
 
 
 

Re: Lost answers?

2012-07-15 Thread Bruno Mannina

I forgot:

I do the request on the uniqueKey field, so each request gets one document

Le 15/07/2012 14:11, Bruno Mannina a écrit :

Dear Solr Users,

I have a solr3.6 + Tomcat and I have a program that connect 4 http 
requests at the same time.

I must do 1902 requests.

I do several tests but each time it losts some requests:
- sometimes I get 1856 docs, 1895 docs, 1900 docs but never 1902 docs.

With Jetty, I get always 1902 docs.

As it's a dev' environment, I'm alone to test it.

Is it a problem to do 4 requests at the same time for tomcat6?

thanks for your info,

Bruno







RE: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Nick Koton
 Solrj multi-threaded client sends several 1,000 docs/sec

Can you expand on that?  How many threads at once are sending docs to solr?
Is each request a single doc or multiple?
I realize, after the fact, that my solrj client is much like
org.apache.solr.client.solrj.LargeVolumeTestBase.  The number of threads is
configurable at run time as are the various commit parameters.  Most of the
test have been in the 4-16 threads range.  Most of my testing has been with
the single document SolrServer::add(SolrInputDocument doc )method.  When I
realized what LargeVolumeTestBase is doing, I converted my program to use
the SolrServer::add(CollectionSolrInputDocument docs) method with 100
documents in each add batch.  Unfortunately, the out of memory errors still
occur without client side commits.

If you agree my three approaches to committing are logical, would it be
useful for me to try to reproduce this with example schema in a small
cloud configuration using LargeVolumeTestBase or the like?  It will take me
a couple days to work it in.  Or perhaps this sort of test is already run?

Best 
Nick

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Sunday, July 15, 2012 11:05 AM
To: Nick Koton
Cc: solr-user@lucene.apache.org
Subject: Re: SOLR 4 Alpha Out Of Mem Err

On Sun, Jul 15, 2012 at 11:52 AM, Nick Koton nick.ko...@gmail.com wrote:
 Do you have the following hard autoCommit in your config (as the 
 stock
 server does)?
 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

 I have tried with and without that setting.  When I described running 
 with auto commit, that setting is what I mean.

OK cool.  You should be able to run the stock server (i.e. with this
autocommit) and blast in updates all day long - it looks like you have more
than enough memory.  If you can't, we need to fix something.  You shouldn't
need explicit commits unless you want the docs to be searchable at that
point.

 Solrj multi-threaded client sends several 1,000 docs/sec

Can you expand on that?  How many threads at once are sending docs to solr?
Is each request a single doc or multiple?

-Yonik
http://lucidimagination.com



JRockit with SOLR3.4/3.5

2012-07-15 Thread Salman Akram
We used JRockit with SOLR1.4 as default JVM had mem issues (not only it was 
consuming more mem but didn't restrict to the max mem allocated to tomcat - 
jrockit did restrict to max mem). However, JRockit gives an error while using 
it with SOLR3.4/3.5. Any ideas, why?

*** This Message Has Been Sent Using BlackBerry Internet Service from Mobilink 
***


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Eric,

Thanks for the reply.

The query:
 
http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on

yields this in the debug section:

lst name=debugstr name=rawquerystringCITY:MILTON/str
  str name=querystringCITY:MILTON/str
  str name=parsedqueryCITY:MILTON/str
  str name=parsedquery_toStringCITY:MILTON/str
  str name=QParserLuceneQParser/str

in the explain section. There is no information about grouping.

Second query:

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

yields this in the debug section:

lst name=debug
  str name=rawquerystring*/str
  str name=querystring*/str
  str name=parsedqueryID:*/str
  str name=parsedquery_toStringID:*/str
  str name=QParserLuceneQParser/str

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something.

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name MILTON and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups.

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

yields the same (for me perplexing) results:

lst name=grouped
  lst name=ID
  int name=matches284/int
  int name=ngroups134/int

(i.e.: fq says: 134 groups with CITY:MILTON)
...

lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
   ...
  int name=MILTON103/int

(i.e.: faceted search says: 103 groups with CITY:MILTON)

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this.

Thank you and best regards,
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995150.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Erick, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on

yields this in the debug section: 

lst name=debugstr name=rawquerystringCITY:MILTON/str
  str name=querystringCITY:MILTON/str
  str name=parsedqueryCITY:MILTON/str
  str name=parsedquery_toStringCITY:MILTON/str
  str name=QParserLuceneQParser/str

in the explain section. There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

yields this in the debug section: 

lst name=debug
  str name=rawquerystring*/str
  str name=querystring*/str
  str name=parsedqueryID:*/str
  str name=parsedquery_toStringID:*/str
  str name=QParserLuceneQParser/str

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name MILTON and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on
 

yields the same (for me perplexing) results: 

lst name=grouped
  lst name=ID
  int name=matches284/int
  int name=ngroups134/int

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 

lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
   ... 
  int name=MILTON103/int

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995152.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Eric, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on

yields this in the debug section: 

lst name=debugstr name=rawquerystringCITY:MILTON/str
  str name=querystringCITY:MILTON/str
  str name=parsedqueryCITY:MILTON/str
  str name=parsedquery_toStringCITY:MILTON/str
  str name=QParserLuceneQParser/str

There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

yields this in the debug section: 

lst name=debug
  str name=rawquerystring*/str
  str name=querystring*/str
  str name=parsedqueryID:*/str
  str name=parsedquery_toStringID:*/str
  str name=QParserLuceneQParser/str

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name MILTON and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on
 

yields the same (for me perplexing) results: 

lst name=grouped
  lst name=ID
  int name=matches284/int
  int name=ngroups134/int

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 

lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
   ... 
  int name=MILTON103/int

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995154.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Erick, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on

yields this in the debug section: 

lst name=debugstr name=rawquerystringCITY:MILTON/str
  str name=querystringCITY:MILTON/str
  str name=parsedqueryCITY:MILTON/str
  str name=parsedquery_toStringCITY:MILTON/str
  str name=QParserLuceneQParser/str

There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

yields this in the debug section: 

lst name=debug
  str name=rawquerystring*/str
  str name=querystring*/str
  str name=parsedqueryID:*/str
  str name=parsedquery_toStringID:*/str
  str name=QParserLuceneQParser/str

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name MILTON and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on
 

yields the same (for me perplexing) results: 

lst name=grouped
  lst name=ID
  int name=matches284/int
  int name=ngroups134/int

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 

lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
   ... 
  int name=MILTON103/int

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995156.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: JRockit with SOLR3.4/3.5

2012-07-15 Thread Michael Della Bitta
Hello, Salman,

It would probably be helpful if you included the text/stack trace of
the error you're encountering, plus any other pertinent system
information you can think of.

One thing to remember is the memory usage you tune with Xmx is only
the maximum size of the heap, and there are other types of memory
usage by the JVM that don't fall under that (Permgen space, memory
mapped files, etc).

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Sun, Jul 15, 2012 at 3:19 PM, Salman Akram
salman.ak...@northbaysolutions.net wrote:
 We used JRockit with SOLR1.4 as default JVM had mem issues (not only it was 
 consuming more mem but didn't restrict to the max mem allocated to tomcat - 
 jrockit did restrict to max mem). However, JRockit gives an error while using 
 it with SOLR3.4/3.5. Any ideas, why?

 *** This Message Has Been Sent Using BlackBerry Internet Service from 
 Mobilink ***


Re: SOLR 4 Alpha Out Of Mem Err

2012-07-15 Thread Michael Della Bitta
unable to create new native thread

That suggests you're running out of threads, not RAM. Possibly you're
using a multithreaded collector, and it's pushing you over the top of
how many threads your OS lets a single process allocate? Or somehow
the thread stack size is set too high?

More here: 
http://stackoverflow.com/questions/763579/how-many-threads-can-a-java-vm-support

Michael Della Bitta


Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Sun, Jul 15, 2012 at 2:45 PM, Nick Koton nick.ko...@gmail.com wrote:
 Solrj multi-threaded client sends several 1,000 docs/sec

Can you expand on that?  How many threads at once are sending docs to solr?
 Is each request a single doc or multiple?
 I realize, after the fact, that my solrj client is much like
 org.apache.solr.client.solrj.LargeVolumeTestBase.  The number of threads is
 configurable at run time as are the various commit parameters.  Most of the
 test have been in the 4-16 threads range.  Most of my testing has been with
 the single document SolrServer::add(SolrInputDocument doc )method.  When I
 realized what LargeVolumeTestBase is doing, I converted my program to use
 the SolrServer::add(CollectionSolrInputDocument docs) method with 100
 documents in each add batch.  Unfortunately, the out of memory errors still
 occur without client side commits.

 If you agree my three approaches to committing are logical, would it be
 useful for me to try to reproduce this with example schema in a small
 cloud configuration using LargeVolumeTestBase or the like?  It will take me
 a couple days to work it in.  Or perhaps this sort of test is already run?

 Best
 Nick

 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: Sunday, July 15, 2012 11:05 AM
 To: Nick Koton
 Cc: solr-user@lucene.apache.org
 Subject: Re: SOLR 4 Alpha Out Of Mem Err

 On Sun, Jul 15, 2012 at 11:52 AM, Nick Koton nick.ko...@gmail.com wrote:
 Do you have the following hard autoCommit in your config (as the
 stock
 server does)?
 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

 I have tried with and without that setting.  When I described running
 with auto commit, that setting is what I mean.

 OK cool.  You should be able to run the stock server (i.e. with this
 autocommit) and blast in updates all day long - it looks like you have more
 than enough memory.  If you can't, we need to fix something.  You shouldn't
 need explicit commits unless you want the docs to be searchable at that
 point.

 Solrj multi-threaded client sends several 1,000 docs/sec

 Can you expand on that?  How many threads at once are sending docs to solr?
 Is each request a single doc or multiple?

 -Yonik
 http://lucidimagination.com



Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

2012-07-15 Thread Roman Chyla
I am using AbstractSolrTestCase (which in turn uses
solr.util.TestHarness) as a basis for unittests, but the solr
installation is outside of my source tree and I don't want to
duplicate it just to change a few lines (and with the new solr4.0 I
hope I can get the test-framework in a jar file, previously that
wasn't possible). So in essence, I have to deal with the expected
folder structure for all my unittests.

The way I make the configuration visible outside the solr standard
paths is to get the classloader and add folders to it, this way test
extensions for solr without having the same configuration. But I
should mimick the folder structure to be compatible.

Thanks all for you help, it is much appreciated.

roman

On Sun, Jul 15, 2012 at 1:46 PM, Mark Miller markrmil...@gmail.com wrote:
 The beta will have files that where in solr/conf and solr/data in 
 solr/collection1/conf|data instead.

 What Solr test cases are you referring to? The only ones that should care 
 about this would have to be looking at the file system. If that is the case, 
 simply update the path. The built in tests had to be adjusted for this as 
 well.

 The problem with having the default core use /solr as a conf dir is that if 
 you create another core, where does it logically go? The default collection 
 is called collection1, so now its conf and data lives in a folder called 
 collection1. A new SolrCore called newsarticles would have it's conf and data 
 in /solr/newsarticles.

 There are still going to be some bumps as you move from alpha to beta to 
 release if you are depending on very specific file system locations - 
 however, they should be small bumps that are easily handled.

 Just send an email to the user list if you'd like some help with anything in 
 particular.

 In this case, I'd update what you have to look at /solr/collection1 rather 
 than simply /solr. It's still the default core, so simple URLs without the 
 core name will still work. It won't affect HTTP communication. Just file 
 system location.

 On Jul 14, 2012, at 9:54 PM, Roman Chyla wrote:

 Hi,

 Is it intentional that the ALPHA release has a different folder structure
 as opposed to the trunk?

 eg. collection1 folder is missing in the ALPHA, but present in branch_4x
 and trunk

 lucene-trunk/solr/example/solr/collection1/conf/xslt/example_atom.xsl
 4.0.0-ALPHA/solr/example/solr/conf/xslt/example_atom.xsl
 lucene_4x/solr/example/solr/collection1/conf/xslt/example_atom.xsl


 This has consequences for development - e.g. solr testcases do not expect
 that the collection1 is there for ALPHA.

 In general, what is your advice for developers who are upgrading from solr
 3.x to solr 4.x? What codebase should we follow to minimize the pain of
 porting to the next BETA and stable releases?

 Thanks!

  roman

 - Mark Miller
 lucidimagination.com













are stopwords indexed?

2012-07-15 Thread Giovanni Gherdovich
Hi all,

are stopwords from the stopwords.txt config file
supposed to be indexed?

I would say no, but this is the situation I am
observing on my Solr instance:

* I have a bunch of stopwords in stopwords.txt
* my fields are of fieldType text from the example schema.xml,
  i.e. I have

-- -- 8 -- -- 8 -- -- 8 -- -- 8
   fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
[...]
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_FR.txt
enablePositionIncrements=true
/
[...]
  /analyzer
  analyzer type=query
 [...]
 filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords_FR.txt
enablePositionIncrements=true
/
  /analyzer
   /fieldType
-- -- 8 -- -- 8 -- -- 8 -- -- 8

* searching for a stopwords thru solr gives always zero results
* inspecting the index with LuCLI
http://manpages.ubuntu.com/manpages/natty/man1/lucli.1.html
  show that all stopwords are in my index. Note that I query
  LuCLI specifying the field, i.e. with myFieldName:and
  and not just with the stopword and.

Is this normal?

Are stopwords indexed?

Cheers,
Giovanni


Re: Solr - Spatial Search for Specif Areas on Map

2012-07-15 Thread David Smiley (@MITRE.org)
Sam,

These are big numbers you are throwing around, especially the query volume. 
How big are these records that you have 4 billion of -- or put another way,
how much space would it take up in a pure form like in CSV?  And should I
assume the searches you are doing are more than geospatial?  In any case, a
Solr solution here is going to involve many machines.  The biggest number
you propose is 10k queries per second which is hard to imagine.

I've seen some say Solr 4 might have 100M records per shard, although there
is a good deal variability -- as usual, YMMV.  But lets go with that for
this paper-napkin calculation.  You would need 40 shards of 100M documents
each to get to 4000M (4B) documents.  That is a lot of shards, but people
have done it, I believe.  This scales out to your document collection but
not up to your query volume which is extremely high.  I have some old
benchmarks suggesting ~10ms geo queries on spatial queries for SOLR-2155
which was rolled into the spatial code in Lucene 4 (Solr adapters are on the
way).  But for a full query overhead and for a safer estimate, lets say
50ms.  So perhaps you might get 20 concurrent queries per second (which
seems high but we'll go with it).  But you require 10k/sec(!) so this means
you need 500 times the 20qps which means 500 *times* the base hardware to
support the 40 shards I mentioned before.  In other words, the 4B documents
need to be replicated 500 times to support 10k/second queries.  So
theoretically, we're talking 500 clusters, each cluster having 40 shards --
at ~4 shards/machine this is 10 machines per cluster: 5,000 machines in
total.  Wow.  Doesn't seem realistic.  If you have a reference to some
system or person's experience with any system that can, Solr or not, then
please share.

If you or anyone were to attempt to see if Solr scale's for their needs, a
good approach is to consider just one shard non-replicated, or even better a
handful that would all exist on one machine.  Optimize it as much as you
can.  Then see how much data you can put on this machine and with what
query-volume.  From this point, it's basic math to see how many more such
machines are required to scale out to your data size and up to your query
volume.

Care to explain why so much data needs to be searched at such a volume? 
Maybe you work for Google ;-)

To your question on scalability vs PostGIS, I think Solr shines in its
ability to scale out if you have the resources to do it.

~ David Smiley

-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spatial-Search-for-Specif-Areas-on-Map-tp3995051p3995197.html
Sent from the Solr - User mailing list archive at Nabble.com.


Wildcard query vs facet.prefix for autocomplete?

2012-07-15 Thread santamaria2
I'm about to implement an autocomplete mechanism for my search box. I've read
about some of the common approaches, but I have a question about wildcard
query vs facet.prefix.

Say I want autocomplete for a title: 'Shadows of the Damned'. I want this to
appear as a suggestion if I type 'sha' or 'dam' or 'the'. I don't care that
it won't appear if I type 'hadows'. 

While indexing, I'd use a whitespace tokenizer and a lowercase filter to
store that title in the index.
Now I'm thinking two approaches for 'dam' typed in the search box:

1) q=title:dam*

2) q=*:*facet=onfacet.field=titlefacet.prefix=dam


So any reason that I should favour one over the other? Speed a factor? The
index has around 200,000 items.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199.html
Sent from the Solr - User mailing list archive at Nabble.com.