date:20110203

Facet Query

2011-02-03 Thread Bagesh Sharma


Hi, Is facet query and fq parameters works only for range queries. can i make
a general query for it like searching a facet.query=city:mumbai and getting
results back. please suggest. 
When i made this query i am only getting count back for it . How can i get
documents for it.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-Query-tp2422212p2422212.html
Sent from the Solr - User mailing list archive at Nabble.com.

Problem in faceting

2011-02-03 Thread Bagesh Sharma


Dear sir, i have problem with faceting.
 I am searching a text  "water treatment plant" on solr using dismax request
handler.

The final query which goes to solr is here -




+((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 |
TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water |
TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 |
TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 |
TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 |
TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 |
TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 |
TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0
| TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2
(TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 |
TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant |
TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 |
TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 |
TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:"water treatment
plant"^10.0 | TDR_SUBIND_COMP_NAME:"water treatment plant"^20.0 |
TDR_SUBIND_SUBTDR_SHORT:"water treatment plant"^15.0)~0.2





Now i want to do faceting over those results which have complete text "water
treatment plant " in it. means the records which have "water treatment
plant" completely. i donot want to do faceting on the results which has 1 or
2 words matching like "water" or "treatment". But in case of above query i
am not able to achieve this thing.

The Main Problem :

There is a field FACET_CITY in my schema.xml and i want to  find out only
those cities for which the complete text "water treatment plant" should
match. I don't want those cities for which only "water" or "treatment" words
are matching.

I have two possibilities to achieve this functionality -
1. Either anyhow i can find out the cities list for which the complete text
is matching means faceting only on complete text matching documents
 OR
2. Faceting over first 100 documents only for cities list. It may be for
first 100 documents having more score.

Please Suggest me how can i achieve this.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-in-faceting-tp2422182p2422182.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr faceting on score

2011-02-03 Thread Bagesh Sharma


Thanks for reply
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-faceting-on-score-tp2422076p2422147.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr faceting on score

2011-02-03 Thread Grijesh


No ,You can not get facets on score ,score is not defined in schema facet can
be get only fields defined in schema as I know.

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-faceting-on-score-tp2422076p2422121.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-03 Thread Dominique Bejean


Hi,

I would not try to change the lucene version in Solr 1.4.1 from 2.9.x to 
3.0.x.


As said Koji, the best solution is to get the branch 3.x or the trunk 
and build it. You need svn and ant.


1. Create a working directory

$ mkdir ~/solr

2. Get the source

$ cd ~/solr

$ svn co http://svn.apache.org/repos/asf/lucene/dev/trunk
or
$ svn co http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x

3. build

$ cd ~/solr/modules
$ ant compile
$ cd ~/solr/lucene
$ ant dist
$ cd ~/solr/modules
$ ant dist

Dominique

Le 02/02/11 12:47, Churchill Nanje Mambe a écrit :

thanks guys
  I will try the trunk

as for unpacking the war and changing the lucene... I am not an expect and
this my get complicated for me maybe over time
when I am comfortable

Mambe Churchill Nanje
237 33011349,
AfroVisioN Founder, President,CEO
http://www.afrovisiongroup.com | http://mambenanje.blogspot.com
skypeID: mambenanje
www.twitter.com/mambenanje



On Wed, Feb 2, 2011 at 8:03 AM, Grijesh  wrote:


You can extract the solr.war using java's jar -xvf solr.war  command

change the lucene-2.9.jar with your lucene-3.0.3.jar in WEB-INF/lib
directory

then use jar -cxf solr.war * to again pack the war

deploy that war hope that work

-
Thanx:
Grijesh
--
View this message in context:
http://lucene.472066.n3.nabble.com/SOLR-1-4-and-Lucene-3-0-3-index-problem-tp2396605p2403542.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Using terms and N-gram

2011-02-03 Thread openvictor Open

Okay so as suggested Shingle works perfectly well for what I need !
Thank you Erick

2011/2/3 openvictor Open 

> Thank you for these inputs.
>
> I was silly asking for ngrams because I already knew it. I think I was
> tired yesterday...
>
> Thank you Eric Erickson, once again you gave me a more than useful comment.
> Indeed Shingles seems to be the perfect fit for the work I want to do. I
> will try to implement that tonight and I will come back to see if it's
> working.
>
> Regards,
> Victor
>
> 2011/2/3 Erick Erickson 
>
> First, you'll get a lot of insight by defining something simply and looking
>> at the analysis page from solr admin. That's a very valuable page.
>>
>> To your question:
>> commongrams are "shingles" that work between stopwords and
>> other words. For instance, "this is some text" gets analyzed into
>> this, this_is, is, is_some, some text. Note that the stopwords
>> are the only things that get combined with the text after.
>>
>> NGrams form on letters. It's too long to post the whole thing, but
>> the above phrase gets analyzed as
>> t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a
>> single
>> token into grams whereas commongrams essentially combines tokens
>> when they're stopwords.
>>
>> Have you looked at "shingles"? See:
>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>> Best
>> Erick
>>
>>
>> On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open > >wrote:
>>
>> > Thank you, I will do that and hopefuly it will be handy !
>> >
>> > But can someone explain me difference between CommonGramFIlterFactory et
>> > NGramFilterFactory ? ( Maybe the solution is there)
>> >
>> > Thank you all,
>> > best regards
>> >
>> > 2011/2/3 Grijesh 
>> >
>> > >
>> > > Use analysis.jsp to see what happening at index time and query time
>> with
>> > > your
>> > > input data.You can use highlighting to see if match found.
>> > >
>> > > -
>> > > Thanx:
>> > > Grijesh
>> > > http://lucidimagination.com
>> > > --
>> > > View this message in context:
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
>> > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > >
>> >
>>
>
>

Re: Use Parallel Search

2011-02-03 Thread Ganesh

I am having similar kind of problem. I need to scale out. Could you explain how 
you have done distributed indexing and search using Lucene.

Regards
Ganesh  

- Original Message - 
From: "Gustavo Maia" 
To: 
Sent: Thursday, February 03, 2011 11:36 PM
Subject: Use Parallel Search


> Hello,
> 
> Let me give a brief description of my scenario.
> Today I am only using Lucene 2.9.3. I have an index of 30 million documents
> distributed on three machines and each machine with 6 hds (15k rmp).
> The server queries the search index using the remote class search. And each
> machine is made to search using the parallel search (search simultaneously
> in 6 hds).
> So during the search are simulating using the three machines and 18 hds,
> returning me to a very good response time.
> 
> 
> Today I am studying the SOLR and am interested in knowing more about the
> searches and use of distributed parallel search on the same machine. What
> would be the best scenario using SOLR that is better than I already am using
> today only with lucene?
>  Note: I need to have installed on each machine 6 SOLR instantiate from my
> server? One for each hd? Or would some other alternative way for me to use
> the 6 hds without having 6 instances of SORL server?
> 
>  Another question would be if the SOLR would have some limiting size index
> for Hard drive? It would be interesting not index too big because when the
> index increased the longer the search.
> 
> Thanks for everything.
> 
> 
> Gustavo Maia
>
Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download 
Now! http://messenger.yahoo.com/download.php

Re: facet.mincount

2011-02-03 Thread Isan Fulia

Thanks to all

On 3 February 2011 20:21, Grijesh  wrote:

>
> Hi
>
> facet.mincount not works with facet.date option afaik.
> There is an issue for it as solr-343, but resolved.
> Try apply patch, provided as a solution in this issue may solve the
> problem.
> Fix version for this may be 1.5
>
> -
> Thanx:
> Grijesh
> http://lucidimagination.com
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2414232.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks & Regards,
Isan Fulia.

Re: HTTP ERROR 400 undefined field: *

2011-02-03 Thread Chris Hostetter


: I was working on an checkout of the 3.x branch from about 6 months ago.
: Everything was working pretty well, but we decided that we should update and
: get what was at the head.  However after upgrading, I am now getting this

FWIW: please be specific.  "head" of what? the 3x branch? or trunk?  what 
revision in svn does that corrispond to? (the "svnversion" command will 
tell you)

: HTTP ERROR 400 undefined field: *
: 
: If I clear the fl parameter (default is set to *, score) then it works fine
: with one big problem, no score data.  If I try and set fl=score I get the same
: error except it says undefined field: score?!
: 
: This works great in the older version, what changed?  I've googled for about
: an hour now and I can't seem to find anything.

i can't reproduce this using either trunk (r1067044) or 3x (r1067045)

all of these queries work just fine...

http://localhost:8983/solr/select/?q=*
http://localhost:8983/solr/select/?q=solr&fl=*,score
http://localhost:8983/solr/select/?q=solr&fl=score
http://localhost:8983/solr/select/?q=solr

...you'll have to proivde us with a *lot* more details to help understand 
why you might be getting an error (like: what your configs look like, what 
the request looks like, what the full stack trace of your error is in the 
logs, etc...)




-Hoss

Re: HTTP ERROR 400 undefined field: *

2011-02-03 Thread Grijesh


How you have upgraded ?
are you changed every thing all jars ,data,config 
or any thing using from older version?

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-ERROR-400-undefined-field-tp2417938p2421569.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Index Not Matching

2011-02-03 Thread Grijesh


http://localhost:8080/select/?q=*:* will return all records form solr

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-Not-Matching-tp2417612p2421560.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: response when using my own QParserPlugin

2011-02-03 Thread Grijesh


Are looking your output in any Browser?
Which browser you are using?
If chrome then look for view source it will give your desired xml output or
change any other browser to see xml output

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/response-when-using-my-own-QParserPlugin-tp2419367p2421499.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Function Question

2011-02-03 Thread William Bell

I like it. You would think it would be easy to get the values from a
multiValue field in the geodist() function,
but I guess it was not built for that. If anyone has done something
similar, let me know. Thanks.

Bill


On Thu, Feb 3, 2011 at 3:18 PM, Geert-Jan Brits  wrote:
> I don't have a direct answer to your question, but you could consider having
> fields:
> latCombined and LongCombined where you pairwise combine the latitudes and
> longitudes, e.g:
>
> latCombined: 48.0-49.0-50.0
> longcombined: 2.0-3.0-4.0
>
> Than in your custom scorer above split latCombined and longcombined and
> calculate the closests distance to the user-defined point.
>
> hth,
> Geert-Jan
>
> 2011/2/3 William Bell 
>
>> Thoughts?
>>
>> On Wed, Feb 2, 2011 at 10:38 PM, Bill Bell  wrote:
>> >
>> > This is posted as an enhancement on SOLR-2345.
>> >
>> > I am willing to work on it. But I am stuck. I would like to loop through
>> > the lat/long values when they are stored in a multiValue list. But it
>> > appears that I cannot figure out to do that. For example:
>> >
>> > sort=geodist() asc
>> > This should grab the closest point in the MultiValue list, and return the
>> > distance so that is can be scored.
>> > The problem is I cannot find a way to get the MultiValue list?
>> > In function:
>> >
>> src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja
>> > va
>> > Has code similar to:
>> > VectorValueSource p2;
>> > this.p2 = vs
>> > List sources = p2.getSources();
>> > ValueSource latSource = sources.get(0);
>> > ValueSource lonSource = sources.get(1);
>> > DocValues latVals = latSource.getValues(context1, readerContext1);
>> > DocValues lonVals = lonSource.getValues(context1, readerContext1);
>> > double latRad = latVals.doubleVal(doc) *
>> DistanceUtils.DEGREES_TO_RADIANS;
>> > double lonRad = lonVals.doubleVal(doc) *
>> DistanceUtils.DEGREES_TO_RADIANS;
>> > etc...
>> > It would be good if I could loop through sources.get() but it only
>> returns
>> > 2 sources even when there are 2 pairs of lat/long. The getSources() only
>> > returns the following:
>> > sources:[double(store_0_coordinate), double(store_1_coordinate)]
>> > How do I just get the 4 values in the function?
>> >
>> >
>> >
>>
>

Re: Scale out design patterns

2011-02-03 Thread Ganesh

I am also in the same idea. Based on the field, I could shard but there are two 
practical difficulties.

1. If normal user logged-in then result could be fetched from the corresponding 
search server but if Admin user logged-in, then he may need to see all data. 
The query should be issued across servers and results should be consolidated.

2. Consider a scenario I am sharding based on the User, I am having single 
search server and It is handling 1000 members. Now as the memory consumption is 
high,  I have added one more search server. New users could access the second 
server but what about the old users, their data will be still added to the 
server1. How to address this issue. Is rebuilding the index the only way. 

Could any one share their experience, How they solved scale out problems?

Regards
Ganesh 

- Original Message - 
From: "Anshum" 
To: 
Sent: Friday, January 21, 2011 12:04 PM
Subject: Re: Scale out design patterns

> Hi Ganesh,
> I'd suggest, if you have a particular dimension/field on which you could
> shard your data such that the query/data breakup gets predictable, that
> would be a good way to scale out e.g. if you have users which are equally
> active/searched then you may want to split their data on a simple mod of
> some numeric (auto increment) userid.
> This works well under normal cases unless your partitioning is not
> predictable.
> 
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
> 
> 
> On Fri, Jan 21, 2011 at 10:52 AM, Ganesh  wrote:
> 
>> Hello all,
>>
>> Could you any one guide me what all the various ways we could scale out?
>>
>> 1. Index:  Add data to the nodes in round-robin.
>>   Search: Query all the nodes and cluster the results using carrot2.
>>
>> 2.Horizontal partitioning and No shared architecture,
>>   Index:   Split the data based on userid and index few set of users data
>> in each node.
>>   Search: Have a mapper kind of application which could tell which userid
>> is mapped to node, redirect the search traffic to corresponding node.
>>
>> Which one is best? Did you guys tried any of these approach. Please share
>> your thoughts.
>>
>> Regards
>> Ganesh
>> Send free SMS to your Friends on Mobile from your Yahoo! Messenger.
>> Download Now! http://messenger.yahoo.com/download.php
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download 
Now! http://messenger.yahoo.com/download.php

response when using my own QParserPlugin

2011-02-03 Thread Tri Nguyen

Hi,

I wrote a QParserPlugin.  When I hit solr and use this QParserPlugin, the 
response does not have the column names associated with the data such as:

0 29 0 {!tnav} faketn1 CA city san francisco US 10 - - 495,496,497 
500,657,498,499 us:ca:san francisco faketn,fakeregression 037.74 -122.49 
faketn1 
faketn1 faketn1 faketn1 faketn1 99902837 
+3774-12250|+3774-12250@1|+3772-12252@2 94116:us 495,496,497 
fakecs,fakeatti,fakevenable 500,657,498,499 San Francisco 667 US 37.742369 
-122.491240 Main Dishes Pancakes faketn1 2.99 Enjoy 
best chinese food. faketn1 1;0:0:0:0:8:20% off.0:0:0:3:0.0 4158281775 94116 
ACTION_MODEL TN CA 2350 Taraval St Enjoy best chinese food 40233 - 
5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3
 2027 - 



How do I get the data to be associated with the index columns so I can parse it 
and know the context of the data (such as this data is the business name, this 
data is the address, etc).

---


i was hoping it return something like this or some sort of structure.

 
- 
- 
  0 
  1 
- 
  on 
  0 
  I_NAME_EXACT:faketn1 
  10 
  2.2 
  
  
- 
- 
- 
  - 
  - 
  
  495,496,497 
  500,657,498,499 
  us:ca:san francisco 
  faketn,fakeregression 
  037.74 
  -122.49 
  faketn1 
  faketn1 
  faketn1 
  faketn1 
  faketn1 
  99902837 
  +3774-12250|+3774-12250@1|+3772-12252@2 
  94116:us 
  495,496,497 
  fakecs,fakeatti,fakevenable 
  500,657,498,499 
  San Francisco 
  667 
  US 
   
  37.742369 
  -122.491240 
  Main Dishes Pancakes faketn1 
2.99 

  Enjoy best chinese food. 
  faketn1 
  1;0:0:0:0:8:20% off.0:0:0:3:0.0 
  4158281775 
  94116 
  ACTION_MODEL 
  TN 
   
  CA 
  2350 Taraval St 
   
   
  Enjoy best chinese food 
  40233 
  - 
  
5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3
 
  2027 
   
  - 
  
  
  
 
Tri

RE: Index Not Matching

2011-02-03 Thread Esclusa, Will

Hello Hoss,

That is exactly what it is going on. It seems to be failing in the
analysis of the record. How do I get all the records from SOLR?
http://localhost:8080/select?*.* ?

Thanks!

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, February 03, 2011 5:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Index Not Matching

: At first I thought it could be a schema problem, but we went though it
: with a fine comb and compared it to the one in our stage environment.
: What is really weird is that I grabbed one of the product ID that are
: not showing up in SOLR from the DB, search through the SOLR GUI and it
: found it. 

unless i'm completely missunderstanding you, that means there is a
record 
in Solr for that record in the DB -- which suggests the problem is not
DB 
records getting indexed, it's analysis of some kind -- does a "*:*" (ie:

return all docs) query to solr return the smae number of results as a 
"select count(*)" query on the DB?

there's really not enough info here to make any meaningful guesses as to

the problem.


-Hoss

Re: Index Not Matching

2011-02-03 Thread Savvas-Andreas Moysidis

which field type are you specifying in your schema.xml for the fields that
you search upon? if you are using "text" then this causes your input text to
be stemmed to a common root making your searches more flexible. For
instance:
if you have the term "dreaming" in one row/document and the term "dream" in
another, then this could be stemmed to "dreami" or something like during
indexing.  This effectively causes both your documents to match when you
search for "dream" in Solr but you would only return 1 result if you
searched directly in your database.

On 3 February 2011 22:37, Geert-Jan Brits  wrote:

> Make sure your index is completely commited.
>
> curl 'http://localhost:8983/solr/update?commit=true'
>
>
> http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22
>
> for an overview:
> http://lucene.apache.org/solr/tutorial.html
>
> hth,
> Geert-Jan
> 
>
> 2011/2/3 Esclusa, Will 
>
> > Both the application and the SOLR gui match (with the incorrect number
> > of course :-) )
> >
> > At first I thought it could be a schema problem, but we went though it
> > with a fine comb and compared it to the one in our stage environment.
> > What is really weird is that I grabbed one of the product ID that are
> > not showing up in SOLR from the DB, search through the SOLR GUI and it
> > found it.
> >
> > -Original Message-
> > From: Savvas-Andreas Moysidis
> > [mailto:savvas.andreas.moysi...@googlemail.com]
> > Sent: Thursday, February 03, 2011 4:57 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Index Not Matching
> >
> > that's odd..are you viewing the results through your application or the
> > admin console? if you aren't, I'd suggest you use the admin console just
> > to
> > eliminate the possibility of an application bug.
> > We had a similar problem in the past and turned out to be a mixup of our
> > dev/test instances..
> >
> > On 3 February 2011 21:41, Esclusa, Will 
> > wrote:
> >
> > > Hello Saavs,
> > >
> > > I am 100% sure we are not updating the DB after we index the data. We
> > > are specifying the same fields on both queries. Our prod boxes do not
> > > have access to QA or DEV, so I would expect a connection error when
> > > indexing if this is the case. No connection errors in the logs.
> > >
> > >
> > >
> > > -Original Message-
> > > From: Savvas-Andreas Moysidis
> > > [mailto:savvas.andreas.moysi...@googlemail.com]
> > > Sent: Thursday, February 03, 2011 4:26 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Index Not Matching
> > >
> > > Hello,
> > >
> > > Are you definitely positive your database isn't updated after you
> > index
> > > your
> > > data? Are you querying against the same field(s) specifying the same
> > > criteria both in Solr and in the database?
> > > Any chance you might be pointing to a dev/test instance of Solr ?
> > >
> > > Regards,
> > > - Savvas
> > >
> > > On 3 February 2011 20:17, Esclusa, Will 
> > > wrote:
> > >
> > > > Greetings!
> > > >
> > > >
> > > >
> > > > My organization is new to SOLR, so please bare with me.  At times,
> > we
> > > > experience an out of sync condition between SOLR index files and our
> > > > Database. We resolved that by clearing the index file and performing
> > a
> > > full
> > > > crawl of the database. Last time we noticed an out of sync
> > condition,
> > > we
> > > > went through our procedure of deleting and crawling, but this time
> > it
> > > did
> > > > not fix it.
> > > >
> > > >
> > > >
> > > > For example, search for swim on the DB and we get 440 products, but
> > > yet
> > > > SOLR states we have 214 products. Has anyone experience anything
> > like
> > > this?
> > > > Does anyone have any suggestions on a trace we can turn on? Again,
> > we
> > > are
> > > > new to SOLR so any help you can provide is greatly appreciated.
> > > >
> > > >
> > > >
> > > > Thanks!
> > > >
> > > >
> > > >
> > > > Will
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

RE: Index Not Matching

2011-02-03 Thread Chris Hostetter

: At first I thought it could be a schema problem, but we went though it
: with a fine comb and compared it to the one in our stage environment.
: What is really weird is that I grabbed one of the product ID that are
: not showing up in SOLR from the DB, search through the SOLR GUI and it
: found it. 

unless i'm completely missunderstanding you, that means there is a record 
in Solr for that record in the DB -- which suggests the problem is not DB 
records getting indexed, it's analysis of some kind -- does a "*:*" (ie: 
return all docs) query to solr return the smae number of results as a 
"select count(*)" query on the DB?

there's really not enough info here to make any meaningful guesses as to 
the problem.


-Hoss

Re: Index Not Matching

2011-02-03 Thread Geert-Jan Brits

Make sure your index is completely commited.

curl 'http://localhost:8983/solr/update?commit=true'

http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22

for an overview:
http://lucene.apache.org/solr/tutorial.html

hth,
Geert-Jan


2011/2/3 Esclusa, Will 

> Both the application and the SOLR gui match (with the incorrect number
> of course :-) )
>
> At first I thought it could be a schema problem, but we went though it
> with a fine comb and compared it to the one in our stage environment.
> What is really weird is that I grabbed one of the product ID that are
> not showing up in SOLR from the DB, search through the SOLR GUI and it
> found it.
>
> -Original Message-
> From: Savvas-Andreas Moysidis
> [mailto:savvas.andreas.moysi...@googlemail.com]
> Sent: Thursday, February 03, 2011 4:57 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Index Not Matching
>
> that's odd..are you viewing the results through your application or the
> admin console? if you aren't, I'd suggest you use the admin console just
> to
> eliminate the possibility of an application bug.
> We had a similar problem in the past and turned out to be a mixup of our
> dev/test instances..
>
> On 3 February 2011 21:41, Esclusa, Will 
> wrote:
>
> > Hello Saavs,
> >
> > I am 100% sure we are not updating the DB after we index the data. We
> > are specifying the same fields on both queries. Our prod boxes do not
> > have access to QA or DEV, so I would expect a connection error when
> > indexing if this is the case. No connection errors in the logs.
> >
> >
> >
> > -Original Message-
> > From: Savvas-Andreas Moysidis
> > [mailto:savvas.andreas.moysi...@googlemail.com]
> > Sent: Thursday, February 03, 2011 4:26 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Index Not Matching
> >
> > Hello,
> >
> > Are you definitely positive your database isn't updated after you
> index
> > your
> > data? Are you querying against the same field(s) specifying the same
> > criteria both in Solr and in the database?
> > Any chance you might be pointing to a dev/test instance of Solr ?
> >
> > Regards,
> > - Savvas
> >
> > On 3 February 2011 20:17, Esclusa, Will 
> > wrote:
> >
> > > Greetings!
> > >
> > >
> > >
> > > My organization is new to SOLR, so please bare with me.  At times,
> we
> > > experience an out of sync condition between SOLR index files and our
> > > Database. We resolved that by clearing the index file and performing
> a
> > full
> > > crawl of the database. Last time we noticed an out of sync
> condition,
> > we
> > > went through our procedure of deleting and crawling, but this time
> it
> > did
> > > not fix it.
> > >
> > >
> > >
> > > For example, search for swim on the DB and we get 440 products, but
> > yet
> > > SOLR states we have 214 products. Has anyone experience anything
> like
> > this?
> > > Does anyone have any suggestions on a trace we can turn on? Again,
> we
> > are
> > > new to SOLR so any help you can provide is greatly appreciated.
> > >
> > >
> > >
> > > Thanks!
> > >
> > >
> > >
> > > Will
> > >
> > >
> > >
> > >
> >
>

RE: Index Not Matching

2011-02-03 Thread Esclusa, Will

Both the application and the SOLR gui match (with the incorrect number
of course :-) )

At first I thought it could be a schema problem, but we went though it
with a fine comb and compared it to the one in our stage environment.
What is really weird is that I grabbed one of the product ID that are
not showing up in SOLR from the DB, search through the SOLR GUI and it
found it. 

-Original Message-
From: Savvas-Andreas Moysidis
[mailto:savvas.andreas.moysi...@googlemail.com] 
Sent: Thursday, February 03, 2011 4:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Index Not Matching

that's odd..are you viewing the results through your application or the
admin console? if you aren't, I'd suggest you use the admin console just
to
eliminate the possibility of an application bug.
We had a similar problem in the past and turned out to be a mixup of our
dev/test instances..

On 3 February 2011 21:41, Esclusa, Will 
wrote:

> Hello Saavs,
>
> I am 100% sure we are not updating the DB after we index the data. We
> are specifying the same fields on both queries. Our prod boxes do not
> have access to QA or DEV, so I would expect a connection error when
> indexing if this is the case. No connection errors in the logs.
>
>
>
> -Original Message-
> From: Savvas-Andreas Moysidis
> [mailto:savvas.andreas.moysi...@googlemail.com]
> Sent: Thursday, February 03, 2011 4:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Index Not Matching
>
> Hello,
>
> Are you definitely positive your database isn't updated after you
index
> your
> data? Are you querying against the same field(s) specifying the same
> criteria both in Solr and in the database?
> Any chance you might be pointing to a dev/test instance of Solr ?
>
> Regards,
> - Savvas
>
> On 3 February 2011 20:17, Esclusa, Will 
> wrote:
>
> > Greetings!
> >
> >
> >
> > My organization is new to SOLR, so please bare with me.  At times,
we
> > experience an out of sync condition between SOLR index files and our
> > Database. We resolved that by clearing the index file and performing
a
> full
> > crawl of the database. Last time we noticed an out of sync
condition,
> we
> > went through our procedure of deleting and crawling, but this time
it
> did
> > not fix it.
> >
> >
> >
> > For example, search for swim on the DB and we get 440 products, but
> yet
> > SOLR states we have 214 products. Has anyone experience anything
like
> this?
> > Does anyone have any suggestions on a trace we can turn on? Again,
we
> are
> > new to SOLR so any help you can provide is greatly appreciated.
> >
> >
> >
> > Thanks!
> >
> >
> >
> > Will
> >
> >
> >
> >
>

DB2 and DataImportHandler

2011-02-03 Thread no spam

I get the following error when trying to index using a DataImportHandler
with solr 1.4.1.  I see that there is an open JIRA with no resolution.  Do I
have to write my own data import handler to work around this issue?

Thanks,
Mark

Feb 3, 2011 5:21:09 PM org.apache.solr.handler.dataimport.JdbcDataSource
closeConnection
SEVERE: Ignoring Error when closing connection
com.ibm.db2.jcc.b.SqlException: [jcc][t4][10251][10308][3.50.152]
java.sql.Connection.close() requested while a transaction is in progress on
the connection.
The transaction remains active, and the connection cannot be closed.
ERRORCODE=-4471, SQLSTATE=null
at com.ibm.db2.jcc.b.wc.a(wc.java:55)
at com.ibm.db2.jcc.b.wc.a(wc.java:119)
at com.ibm.db2.jcc.b.eb.t(eb.java:996)
at com.ibm.db2.jcc.b.eb.w(eb.java:1019)
at com.ibm.db2.jcc.b.eb.u(eb.java:1005)
at com.ibm.db2.jcc.b.eb.close(eb.java:989)
at
org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399)
at
org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390)
at
org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:173)
at
org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:339)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

Re: Function Question

2011-02-03 Thread Geert-Jan Brits

I don't have a direct answer to your question, but you could consider having
fields:
latCombined and LongCombined where you pairwise combine the latitudes and
longitudes, e.g:

latCombined: 48.0-49.0-50.0
longcombined: 2.0-3.0-4.0

Than in your custom scorer above split latCombined and longcombined and
calculate the closests distance to the user-defined point.

hth,
Geert-Jan

2011/2/3 William Bell 

> Thoughts?
>
> On Wed, Feb 2, 2011 at 10:38 PM, Bill Bell  wrote:
> >
> > This is posted as an enhancement on SOLR-2345.
> >
> > I am willing to work on it. But I am stuck. I would like to loop through
> > the lat/long values when they are stored in a multiValue list. But it
> > appears that I cannot figure out to do that. For example:
> >
> > sort=geodist() asc
> > This should grab the closest point in the MultiValue list, and return the
> > distance so that is can be scored.
> > The problem is I cannot find a way to get the MultiValue list?
> > In function:
> >
> src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja
> > va
> > Has code similar to:
> > VectorValueSource p2;
> > this.p2 = vs
> > List sources = p2.getSources();
> > ValueSource latSource = sources.get(0);
> > ValueSource lonSource = sources.get(1);
> > DocValues latVals = latSource.getValues(context1, readerContext1);
> > DocValues lonVals = lonSource.getValues(context1, readerContext1);
> > double latRad = latVals.doubleVal(doc) *
> DistanceUtils.DEGREES_TO_RADIANS;
> > double lonRad = lonVals.doubleVal(doc) *
> DistanceUtils.DEGREES_TO_RADIANS;
> > etc...
> > It would be good if I could loop through sources.get() but it only
> returns
> > 2 sources even when there are 2 pairs of lat/long. The getSources() only
> > returns the following:
> > sources:[double(store_0_coordinate), double(store_1_coordinate)]
> > How do I just get the 4 values in the function?
> >
> >
> >
>

Re: Index Not Matching

2011-02-03 Thread Savvas-Andreas Moysidis

that's odd..are you viewing the results through your application or the
admin console? if you aren't, I'd suggest you use the admin console just to
eliminate the possibility of an application bug.
We had a similar problem in the past and turned out to be a mixup of our
dev/test instances..

On 3 February 2011 21:41, Esclusa, Will  wrote:

> Hello Saavs,
>
> I am 100% sure we are not updating the DB after we index the data. We
> are specifying the same fields on both queries. Our prod boxes do not
> have access to QA or DEV, so I would expect a connection error when
> indexing if this is the case. No connection errors in the logs.
>
>
>
> -Original Message-
> From: Savvas-Andreas Moysidis
> [mailto:savvas.andreas.moysi...@googlemail.com]
> Sent: Thursday, February 03, 2011 4:26 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Index Not Matching
>
> Hello,
>
> Are you definitely positive your database isn't updated after you index
> your
> data? Are you querying against the same field(s) specifying the same
> criteria both in Solr and in the database?
> Any chance you might be pointing to a dev/test instance of Solr ?
>
> Regards,
> - Savvas
>
> On 3 February 2011 20:17, Esclusa, Will 
> wrote:
>
> > Greetings!
> >
> >
> >
> > My organization is new to SOLR, so please bare with me.  At times, we
> > experience an out of sync condition between SOLR index files and our
> > Database. We resolved that by clearing the index file and performing a
> full
> > crawl of the database. Last time we noticed an out of sync condition,
> we
> > went through our procedure of deleting and crawling, but this time it
> did
> > not fix it.
> >
> >
> >
> > For example, search for swim on the DB and we get 440 products, but
> yet
> > SOLR states we have 214 products. Has anyone experience anything like
> this?
> > Does anyone have any suggestions on a trace we can turn on? Again, we
> are
> > new to SOLR so any help you can provide is greatly appreciated.
> >
> >
> >
> > Thanks!
> >
> >
> >
> > Will
> >
> >
> >
> >
>

RE: Index Not Matching

2011-02-03 Thread Esclusa, Will

Hello Saavs,

I am 100% sure we are not updating the DB after we index the data. We
are specifying the same fields on both queries. Our prod boxes do not
have access to QA or DEV, so I would expect a connection error when
indexing if this is the case. No connection errors in the logs. 

-Original Message-
From: Savvas-Andreas Moysidis
[mailto:savvas.andreas.moysi...@googlemail.com] 
Sent: Thursday, February 03, 2011 4:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Index Not Matching

Hello,

Are you definitely positive your database isn't updated after you index
your
data? Are you querying against the same field(s) specifying the same
criteria both in Solr and in the database?
Any chance you might be pointing to a dev/test instance of Solr ?

Regards,
- Savvas

On 3 February 2011 20:17, Esclusa, Will 
wrote:

> Greetings!
>
>
>
> My organization is new to SOLR, so please bare with me.  At times, we
> experience an out of sync condition between SOLR index files and our
> Database. We resolved that by clearing the index file and performing a
full
> crawl of the database. Last time we noticed an out of sync condition,
we
> went through our procedure of deleting and crawling, but this time it
did
> not fix it.
>
>
>
> For example, search for swim on the DB and we get 440 products, but
yet
> SOLR states we have 214 products. Has anyone experience anything like
this?
> Does anyone have any suggestions on a trace we can turn on? Again, we
are
> new to SOLR so any help you can provide is greatly appreciated.
>
>
>
> Thanks!
>
>
>
> Will
>
>
>
>

Re: Function Question

2011-02-03 Thread William Bell

Thoughts?

On Wed, Feb 2, 2011 at 10:38 PM, Bill Bell  wrote:
>
> This is posted as an enhancement on SOLR-2345.
>
> I am willing to work on it. But I am stuck. I would like to loop through
> the lat/long values when they are stored in a multiValue list. But it
> appears that I cannot figure out to do that. For example:
>
> sort=geodist() asc
> This should grab the closest point in the MultiValue list, and return the
> distance so that is can be scored.
> The problem is I cannot find a way to get the MultiValue list?
> In function:
> src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja
> va
> Has code similar to:
> VectorValueSource p2;
> this.p2 = vs
> List sources = p2.getSources();
> ValueSource latSource = sources.get(0);
> ValueSource lonSource = sources.get(1);
> DocValues latVals = latSource.getValues(context1, readerContext1);
> DocValues lonVals = lonSource.getValues(context1, readerContext1);
> double latRad = latVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
> double lonRad = lonVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS;
> etc...
> It would be good if I could loop through sources.get() but it only returns
> 2 sources even when there are 2 pairs of lat/long. The getSources() only
> returns the following:
> sources:[double(store_0_coordinate), double(store_1_coordinate)]
> How do I just get the 4 values in the function?
>
>
>

Re: Index Not Matching

2011-02-03 Thread Savvas-Andreas Moysidis

Hello,

Are you definitely positive your database isn't updated after you index your
data? Are you querying against the same field(s) specifying the same
criteria both in Solr and in the database?
Any chance you might be pointing to a dev/test instance of Solr ?

Regards,
- Savvas

On 3 February 2011 20:17, Esclusa, Will  wrote:

> Greetings!
>
>
>
> My organization is new to SOLR, so please bare with me.  At times, we
> experience an out of sync condition between SOLR index files and our
> Database. We resolved that by clearing the index file and performing a full
> crawl of the database. Last time we noticed an out of sync condition, we
> went through our procedure of deleting and crawling, but this time it did
> not fix it.
>
>
>
> For example, search for swim on the DB and we get 440 products, but yet
> SOLR states we have 214 products. Has anyone experience anything like this?
> Does anyone have any suggestions on a trace we can turn on? Again, we are
> new to SOLR so any help you can provide is greatly appreciated.
>
>
>
> Thanks!
>
>
>
> Will
>
>
>
>

HTTP ERROR 400 undefined field: *

2011-02-03 Thread Jed Glazner


Hey Guys,

I was working on an checkout of the 3.x branch from about 6 months ago. 
Everything was working pretty well, but we decided that we should update 
and get what was at the head.  However after upgrading, I am now getting 
this error through the admin:


HTTP ERROR 400 undefined field: *

If I clear the fl parameter (default is set to *, score) then it works 
fine with one big problem, no score data.  If I try and set fl=score I 
get the same error except it says undefined field: score?!


This works great in the older version, what changed?  I've googled for 
about an hour now and I can't seem to find anything.



Jed.

Index Not Matching

2011-02-03 Thread Esclusa, Will

Greetings!

 

My organization is new to SOLR, so please bare with me.  At times, we 
experience an out of sync condition between SOLR index files and our Database. 
We resolved that by clearing the index file and performing a full crawl of the 
database. Last time we noticed an out of sync condition, we went through our 
procedure of deleting and crawling, but this time it did not fix it.  

 

For example, search for swim on the DB and we get 440 products, but yet SOLR 
states we have 214 products. Has anyone experience anything like this? Does 
anyone have any suggestions on a trace we can turn on? Again, we are new to 
SOLR so any help you can provide is greatly appreciated. 

 

Thanks!

 

Will

Re: Use Parallel Search

2011-02-03 Thread Grant Ingersoll

Can you describe a bit more what you are searching (types of docs) and what 
your query rate looks like?  Also, what features are you using?  Faceting?  
Sorting? ...

On Feb 3, 2011, at 1:06 PM, Gustavo Maia wrote:

> Hello,
> 
> Let me give a brief description of my scenario.
> Today I am only using Lucene 2.9.3. I have an index of 30 million documents
> distributed on three machines and each machine with 6 hds (15k rmp).
> The server queries the search index using the remote class search. And each
> machine is made to search using the parallel search (search simultaneously
> in 6 hds).
> So during the search are simulating using the three machines and 18 hds,
> returning me to a very good response time.
> 
> 
> Today I am studying the SOLR and am interested in knowing more about the
> searches and use of distributed parallel search on the same machine. What
> would be the best scenario using SOLR that is better than I already am using
> today only with lucene?
>  Note: I need to have installed on each machine 6 SOLR instantiate from my
> server?

No, you generally treat Solr like a database and provision it separately from 
you app.  30M docs may very well all fit nicely on one machine depending on 
some of your answers above (I've certainly seen bigger)

> One for each hd? Or would some other alternative way for me to use
> the 6 hds without having 6 instances of SORL server?

I'd probably start simple and see what I can do in 1 instance of Solr and what 
query/indexing throughput you can get.

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

DataImportHandler usage with RDF database

2011-02-03 Thread McGibbney, Lewis John

Hello List,

I am very interested in DataImportHandler. I have data stored in an RDF db and 
wish to use this data to boost query results via Solr. I wish to keep this data 
stored in db as I have a web app which directly maintains this db. Is it 
possible to use a DataImportHandler to read RDF data from db in memory, without 
sending an index commit to Solr. As far as I can see DataImportHandler 
currently supports full and delta imports which mean I would be indexing. So 
far I have yet to find a requestHandler which is able to read then store data 
in memory, then use this data elsewhere prior to returning documents via 
queryResponseWriter.

Can anyone provide their thoughts/insight

Thank you

Lewis


Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education's Widening Participation Initiative of the Year 
2009 and Herald Society's Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education's Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: geodist and spacial search

2011-02-03 Thread Grant Ingersoll

Use a filter query?  See the {!geofilt} stuff on the wiki page.  That gives you 
your filter to restrict down your result set, then you can sort by exact 
distance to get your sort of just those docs that make it through the filter.


On Feb 3, 2011, at 10:24 AM, Eric Grobler wrote:

> Hi Erick,
> 
> Thanks I saw that example, but I am trying to sort by distance AND specify
> the max distance in 1 query.
> 
> The reason is:
> running bbox on 2 million documents with a 20km distance takes only 200ms.
> Sorting 2 million documents by distance takes over 1.5 seconds!
> 
> So it will be much faster for solr to first filter the 20km documents and
> then to sort them.
> 
> Regards
> Ericz
> 
> On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson wrote:
> 
>> Further down that very page ...
>> 
>> Here's an example of sorting by distance ascending:
>> 
>>  -
>> 
>>  ...&q=*:*&sfield=store&pt=45.15,-93.85&sort=geodist()
>> asc<
>> http://localhost:8983/solr/select?wt=json&indent=true&fl=name,store&q=*:*&sfield=store&pt=45.15,-93.85&sort=geodist()%20asc
>>> 
>> 
>> 
>> 
>> 
>> The key is just the &sort=geodist(), I'm pretty sure that's independent of
>> the bbox, but
>> I could be wrong.
>> 
>> Best
>> Erick
>> 
>> On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler >> wrote:
>> 
>>> Hi
>>> 
>>> In http://wiki.apache.org/solr/SpatialSearch
>>> there is an example of a bbox filter and a geodist function.
>>> 
>>> Is it possible to do a bbox filter and sort by distance - combine the
>> two?
>>> 
>>> Thanks
>>> Ericz
>>> 
>> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Solr for finding similar word between two documents

2011-02-03 Thread rohan rai

Lets say 1 have document(file) which is large and contains word inside it.

And the 2nd document also is a text file.

Problem is to find all those words in 2nd document which is present in first
document
when both of the files are large enough.

Regards
Rohan

On Fri, Feb 4, 2011 at 1:01 AM, openvictor Open wrote:

> Rohan : what you want to do can be done with quite little effort if your
> document has a limited size (up to some Mo) with common and basic
> structures
> like Hasmap.
>
> Do you have any additional information on your problem so that we can give
> you more useful inputs ?
>
> 2011/2/3 Gora Mohanty 
>
> > On Thu, Feb 3, 2011 at 11:32 PM, rohan rai  wrote:
> > > Is there a way to use solr and get similar words between two document
> > > (files).
> > [...]
> >
> > This is *way* too vague t make any sense out of. Could you elaborate,
> > as I could have sworn that what you seem to want is the essential
> > function of a search engine.
> >
> > Regards,
> > Gora
> >
>

Re: Use Parallel Search

2011-02-03 Thread Em

Hello Gustavo,

well, I did not use Nutch at all, but I got some experience with using Solr.

In Solr you could use a multicore-setup where each core points to
another hard-drive of your server. For other Solr-Servers ( and cores as
well ) each core is a seperate index, so to query all drives of one
server you have to do a distributed request to get all results from all
cores (indizes).
You got a little bit Http-overhead, because you have to send six
http-requests per server to get your results.

You could also set up 6 Solr-instances per box or 3 with two cores per
instance, but I do not see any reason to do so.


Could you please explain what you mean with "remote class search"? Is it
a Nutch-specific thing I never heard before?

There is no difference between a Lucene-Index created by Solr and a
Lucene-Index created by Nutch or Lucene itself.
Solr is just a Server-implementation of the Lucene-Framework.

Regards

Am 03.02.2011 19:06, schrieb Gustavo Maia:
> Hello,
>
> Let me give a brief description of my scenario.
> Today I am only using Lucene 2.9.3. I have an index of 30 million documents
> distributed on three machines and each machine with 6 hds (15k rmp).
> The server queries the search index using the remote class search. And each
> machine is made to search using the parallel search (search simultaneously
> in 6 hds).
> So during the search are simulating using the three machines and 18 hds,
> returning me to a very good response time.
>
>
> Today I am studying the SOLR and am interested in knowing more about the
> searches and use of distributed parallel search on the same machine. What
> would be the best scenario using SOLR that is better than I already am using
> today only with lucene?
>   Note: I need to have installed on each machine 6 SOLR instantiate from my
> server? One for each hd? Or would some other alternative way for me to use
> the 6 hds without having 6 instances of SORL server?
>
>   Another question would be if the SOLR would have some limiting size index
> for Hard drive? It would be interesting not index too big because when the
> index increased the longer the search.
>
> Thanks for everything.
>
>
> Gustavo Maia
>

Re: Solr for finding similar word between two documents

2011-02-03 Thread openvictor Open

Rohan : what you want to do can be done with quite little effort if your
document has a limited size (up to some Mo) with common and basic structures
like Hasmap.

Do you have any additional information on your problem so that we can give
you more useful inputs ?

2011/2/3 Gora Mohanty 

> On Thu, Feb 3, 2011 at 11:32 PM, rohan rai  wrote:
> > Is there a way to use solr and get similar words between two document
> > (files).
> [...]
>
> This is *way* too vague t make any sense out of. Could you elaborate,
> as I could have sworn that what you seem to want is the essential
> function of a search engine.
>
> Regards,
> Gora
>

Re: Solr for finding similar word between two documents

2011-02-03 Thread Gora Mohanty

On Thu, Feb 3, 2011 at 11:32 PM, rohan rai  wrote:
> Is there a way to use solr and get similar words between two document
> (files).
[...]

This is *way* too vague t make any sense out of. Could you elaborate,
as I could have sworn that what you seem to want is the essential
function of a search engine.

Regards,
Gora

Re: Using terms and N-gram

2011-02-03 Thread openvictor Open

Thank you for these inputs.

I was silly asking for ngrams because I already knew it. I think I was tired
yesterday...

Thank you Eric Erickson, once again you gave me a more than useful comment.
Indeed Shingles seems to be the perfect fit for the work I want to do. I
will try to implement that tonight and I will come back to see if it's
working.

Regards,
Victor

2011/2/3 Erick Erickson 

> First, you'll get a lot of insight by defining something simply and looking
> at the analysis page from solr admin. That's a very valuable page.
>
> To your question:
> commongrams are "shingles" that work between stopwords and
> other words. For instance, "this is some text" gets analyzed into
> this, this_is, is, is_some, some text. Note that the stopwords
> are the only things that get combined with the text after.
>
> NGrams form on letters. It's too long to post the whole thing, but
> the above phrase gets analyzed as
> t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a
> single
> token into grams whereas commongrams essentially combines tokens
> when they're stopwords.
>
> Have you looked at "shingles"? See:
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
> Best
> Erick
>
>
> On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open  >wrote:
>
> > Thank you, I will do that and hopefuly it will be handy !
> >
> > But can someone explain me difference between CommonGramFIlterFactory et
> > NGramFilterFactory ? ( Maybe the solution is there)
> >
> > Thank you all,
> > best regards
> >
> > 2011/2/3 Grijesh 
> >
> > >
> > > Use analysis.jsp to see what happening at index time and query time
> with
> > > your
> > > input data.You can use highlighting to see if match found.
> > >
> > > -
> > > Thanx:
> > > Grijesh
> > > http://lucidimagination.com
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>

Re: Using terms and N-gram

2011-02-03 Thread Erick Erickson

First, you'll get a lot of insight by defining something simply and looking
at the analysis page from solr admin. That's a very valuable page.

To your question:
commongrams are "shingles" that work between stopwords and
other words. For instance, "this is some text" gets analyzed into
this, this_is, is, is_some, some text. Note that the stopwords
are the only things that get combined with the text after.

NGrams form on letters. It's too long to post the whole thing, but
the above phrase gets analyzed as
t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a
single
token into grams whereas commongrams essentially combines tokens
when they're stopwords.

Have you looked at "shingles"? See:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
Best
Erick

On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open wrote:

> Thank you, I will do that and hopefuly it will be handy !
>
> But can someone explain me difference between CommonGramFIlterFactory et
> NGramFilterFactory ? ( Maybe the solution is there)
>
> Thank you all,
> best regards
>
> 2011/2/3 Grijesh 
>
> >
> > Use analysis.jsp to see what happening at index time and query time with
> > your
> > input data.You can use highlighting to see if match found.
> >
> > -
> > Thanx:
> > Grijesh
> > http://lucidimagination.com
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

RE: Using terms and N-gram

2011-02-03 Thread Bob Sandiford

I don't suppose it's something silly like the fact that your indexing chain 
includes 'words="stopwords.txt"', and your query chain does not?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
_
Early COSUGI birds get the worm! 
Register by 15 February and get a one time viewing of the three course 
Circulation Basics self-paced training suite.
http://www.cosugi.org/ 




> -Original Message-
> From: openvictor Open [mailto:openvic...@gmail.com]
> Sent: Thursday, February 03, 2011 12:02 AM
> To: solr-user@lucene.apache.org
> Subject: Using terms and N-gram
> 
> Dear all,
> 
> I am trying to implement an autocomplete system for research. But I am
> stuck
> on some problems that I can't solve.
> 
> Here is my problem :
> I give text like :
> "the cat is black" and I want to explore all 1 gram to 8 gram for all
> the
> text that are passed :
> the, cat, is, black, the cat, cat is, is black, etc...
> 
> In order to do that I have defined the following fieldtype in my schema
> :
> 
> 
> 
>   
> 
>  ignoreCase="true" maxGramSize="8"
>minGramSize="1"/>
>   
>   
> 
>  maxGramSize="8"
>minGramSize="1"/>
>   
> 
> 
> 
> Then the following field :
> 
>  stored="true"/>
> 
> Then I feed solr with some phrases and I was really surprised to see
> that
> Solr didn't behave as expected.
> I went to the schema browser to see the result for the very profound
> query :
> "the cat is black and it rains"
> 
> The results are quite deceiving : first 1 grams are not found. some 2
> grams
> are found like : the_cat, "and_it" etc... But not what I expected.
> Is there something I am missing here ? (by the way I also tried to
> remove
> the mingramsize and maxgramsize even the words).
> 
> Thank you,
> Victor Kabdebon

Re: chaning schema

2011-02-03 Thread Jonathan Rochkind

It could be related Tomcat.  I've had inconsistent experiences there 
too, I _thought_ I could delete just the contents of the data/ 
directory, but at some point I realized that wasn't working, confusing 
me as to whether I was remembering correctly that deleting just the 
contents ever worked.   At the moment, on my setup, I definitely need to 
delete the whole data/ directory .


At one point I switched my setup from jetty to tomcat, but at about the 
same point I switched my setup from single core to multi-core too. So it 
could be a multi-core thing too (which seems somewhat more likely than 
jetty vs tomcat making a difference). Or it could be something 
completely else that none of us know, I just report my limited 
observations from experience. :)


Jonathan

On 2/3/2011 8:17 AM, Erick Erickson wrote:

Erik:

Is this a Tomcat-specific issue? Because I regularly delete just the
data/index directory on my Windows
box running Jetty without any problems. (3_x and trunk)

Mostly want to know because I just encouraged someone to just delete the
index dir based on my
experience...

Thanks
Erick

On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatcherwrote:


the trick is, you have to remove the data/ directory, not just the
data/index subdirectory.  and of course then restart Solr.

or delete *:*?commit=true, depending on what's the best fit for your ops.

Erik

On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:


I tried removing the index directory once, and tomcat refused to sart up

because

it didn't have a segments file.




- Original Message 
From: Erick Erickson
To: solr-user@lucene.apache.org
Sent: Tue, February 1, 2011 5:04:51 AM
Subject: Re: chaning schema

That sounds right. You can cheat and just remove/data/index
rather than delete *:* though (you should probably do that with the Solr
instance stopped)

Make sure to remove the directory "index" as well.

Best
Erick

On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon

wrote:

Anyone got a great little script for changing a schema?

i.e., after changing:
database,
the view in the database for data import
the data-config.xml file
the schema.xml file

I BELIEVE that I have to run:
a delete command for the whole index *:*
a full import and optimize

This all sound right?

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually

a

better
idea to learn from others’ mistakes, so you do not have to make them
yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.

Use Parallel Search

2011-02-03 Thread Gustavo Maia

Hello,

Let me give a brief description of my scenario.
Today I am only using Lucene 2.9.3. I have an index of 30 million documents
distributed on three machines and each machine with 6 hds (15k rmp).
The server queries the search index using the remote class search. And each
machine is made to search using the parallel search (search simultaneously
in 6 hds).
So during the search are simulating using the three machines and 18 hds,
returning me to a very good response time.


Today I am studying the SOLR and am interested in knowing more about the
searches and use of distributed parallel search on the same machine. What
would be the best scenario using SOLR that is better than I already am using
today only with lucene?
  Note: I need to have installed on each machine 6 SOLR instantiate from my
server? One for each hd? Or would some other alternative way for me to use
the 6 hds without having 6 instances of SORL server?

  Another question would be if the SOLR would have some limiting size index
for Hard drive? It would be interesting not index too big because when the
index increased the longer the search.

Thanks for everything.


Gustavo Maia

What is the best protocol for data transfer rate HTTP or RMI?

2011-02-03 Thread Gustavo Maia

Hello,



I am doing a comparative study between Lucene and Solr and wish to obtain
more concrete data on the data transfer using the lucene RemoteSearch that
uses RMI and data transfer of SOLR that uses the HTTP protocol.




Gustavo Maia

Solr for finding similar word between two documents

2011-02-03 Thread rohan rai

Is there a way to use solr and get similar words between two document
(files).

Any ideas

Regards
Rohan

Re: Terms and termscomponent questions

2011-02-03 Thread Erick Erickson

Ah, good. Good luck with the rest of your app! WordDelimiterFilterFactory
is powerful, but tricky ...

Best
Erick

On Thu, Feb 3, 2011 at 9:51 AM, openvictor Open wrote:

> Dear Erick,
>
> You were totally right about the fact that I didn't use any space to
> separate words, cause SolR to concatenate words !
> Everything is solved now. Thank you very much for your help !
>
> Best regards,
> Victor Kabdebon
>
> 2011/2/3 Erick Erickson 
>
> > There are a couple of things going on here. First,
> > WordDelimiterFilterFactory is
> > splitting things up on letter/number boundaries. Take a look at:
> > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> >
> > for a list of *some* of the available tokenizers. You may want to just
> use
> > one of the others, or change the parameters to
> > WordDelimiterFilterFilterFactory
> > to not split as it is.
> >
> > See the page: http://localhost:8983/solr/admin/analysis.jsp and check
> the
> > "verbose"
> > box to see what the effects of the various elements in your analysis
> chain
> > are.
> > This is a very important page for understanding the analysis part of the
> > whole
> > operation.
> >
> > Second, if you've been trying different things out, you may well have
> some
> > old stuff in your index. When you delete documents, the terms are still
> in
> > the index until an optimize. I'd advise starting with a clean slate for
> > your
> > experiments each time. The cheap way to do this is stop your server and
> > delete /data/index. Delete the index directory too, not just
> the
> > contents. So it's possible your TermsComponent is returning data from
> > previous
> > attempts, because I sure don't see how the concatenated terms would be
> > in this index given the definition you've posted.
> >
> > And if none of that works, well, we'll try something else ..
> >
> > Best
> > Erick
> >
> > On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open  > >wrote:
> >
> > > Dear Erick,
> > >
> > > Thank you for your answer, here is my fieldtype definition. I took the
> > > standard one because I don't need a better one for this field
> > >
> > >  positionIncrementGap="100">
> > > 
> > > 
> > >  > > words="stopwords.txt" enablePositionIncrements="true"/>
> > >  > > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > > catenateAll="0" splitOnCaseChange="1"/>
> > > 
> > >  > > protected="protwords.txt"/>
> > > 
> > > 
> > > 
> > >  > > ignoreCase="true" expand="true"/>
> > >  > > words="stopwords.txt" enablePositionIncrements="true"/>
> > >  > > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > > catenateAll="0" splitOnCaseChange="1"/>
> > > 
> > >  > > protected="protwords.txt"/>
> > > 
> > > 
> > >
> > > Now my field :
> > >
> > > 
> > >
> > > But I have a doubt now... Do I really put a space between words or is
> it
> > > just a coma... If I only put a coma then the whole process is going to
> be
> > > impacted ? What I don't really understand is that I find the separate
> > > words,
> > > but also their concatenation (but again in one direction only). Let me
> > > explain : if a have "man" "bear" "pig" I will find :
> > > "manbearpig" "bearpig" but never pigman or anyother combination in a
> > > different order.
> > >
> > > Thank you very much
> > > Best Regards,
> > > Victor
> > >
> > > 2011/2/1 Erick Erickson 
> > >
> > > > Nope, this isn't what I'd expect. There are a couple of
> possibilities:
> > > > 1> check out what WordDelimiterFilterFactory is doing, although
> > > > if you're really sending spaces that's probably not it.
> > > > 2> Let's see the  and  definitions for the field
> > > > in question. type="text" doesn't say anything about analysis,
> > > > and that's where I'd expect you're having trouble. In particular
> > > > if your analysis chain uses KeywordTokenizerFactory for instance.
> > > > 3> Look at the admin/schema browse page, look at your field and
> > > > see what the actual tokens are. That'll tell you what
> > TermsComponents
> > > > is returning, perhaps the concatenation is happening somewhere
> > > > else.
> > > >
> > > > Bottom line: Solr will not concatenate terms like this unless you
> tell
> > it
> > > > to,
> > > > so I suspect you're telling it to, you just don't realize it ...
> > > >
> > > > Best
> > > > Erick
> > > >
> > > > On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open <
> openvic...@gmail.com
> > > > >wrote:
> > > >
> > > > > Dear Solr users,
> > > > >
> > > > > I am currently using SolR and TermsComponents to make an auto
> suggest
> > > for
> > > > > my
> > > > > website.
> > > > >
> > > > > I have a field called p_field indexed and stored with type="text"
> in
> > > the
> > > > > schema xml. Nothing out of the usual.
> > > > > I feed to Solr a set of words separated by a coma and a space such
> as
> > > > (for
> > > > > two documents) :
> > > > >
> > > > > Document 1:
> > > > > word11, word12, word13. word14
> > > > >
> > > > > Document 2:
> > > > > word21, word22, word23. wo

Re: Open Too Many Files

2011-02-03 Thread Gustavo Maia

Try it.

ulimit -n20



2011/2/3 Grijesh 

>
> best option to use
> true
>
> decreasing mergeFactor may cause indexing slow
>
> -
> Thanx:
> Grijesh
> http://lucidimagination.com
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Open-Too-Many-Files-tp2406289p2412415.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: chaning schema

2011-02-03 Thread Dennis Gearon

Well, the nice thing is that I have an Amazon based dev server, and it's AMI 
stored. So if I screw something up, I just throw away that server and get a 
fresh one all configured and full of dev data and BAM back to where I was.

So I'll try it again with the -rf flags. 

I did shut down the server and I am using Tomcat.

 Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Gora Mohanty 
To: solr-user@lucene.apache.org
Sent: Thu, February 3, 2011 6:56:29 AM
Subject: Re: chaning schema

On Thu, Feb 3, 2011 at 6:47 PM, Erick Erickson  wrote:
> Erik:
>
> Is this a Tomcat-specific issue? Because I regularly delete just the
> data/index directory on my Windows
> box running Jetty without any problems. (3_x and trunk)
>
> Mostly want to know because I just encouraged someone to just delete the
> index dir based on my
> experience...
>
> Thanks
> Erick
>
> On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatcher wrote:
>
>> the trick is, you have to remove the data/ directory, not just the
>> data/index subdirectory.  and of course then restart Solr.
>>
>> or delete *:*?commit=true, depending on what's the best fit for your ops.
>>
>>Erik
>>
>> On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:
>>
>> > I tried removing the index directory once, and tomcat refused to sart up
>> because
>> > it didn't have a segments file.
[...]

I have seen this error with Tomcat, but in my experience, this has been due
to doing a "rm data/index/*" rather than "rm -rf /data/index", or due to doing
this without first shutting down Tomcat.

Regards,
Gora

Re: geodist and spacial search

2011-02-03 Thread Eric Grobler

Hi Erick,

Thanks I saw that example, but I am trying to sort by distance AND specify
the max distance in 1 query.

The reason is:
running bbox on 2 million documents with a 20km distance takes only 200ms.
Sorting 2 million documents by distance takes over 1.5 seconds!

So it will be much faster for solr to first filter the 20km documents and
then to sort them.

Regards
Ericz

On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson wrote:

> Further down that very page ...
>
> Here's an example of sorting by distance ascending:
>
>   -
>
>   ...&q=*:*&sfield=store&pt=45.15,-93.85&sort=geodist()
> asc<
> http://localhost:8983/solr/select?wt=json&indent=true&fl=name,store&q=*:*&sfield=store&pt=45.15,-93.85&sort=geodist()%20asc
> >
>
>
>
>
> The key is just the &sort=geodist(), I'm pretty sure that's independent of
> the bbox, but
> I could be wrong.
>
> Best
> Erick
>
> On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler  >wrote:
>
> > Hi
> >
> > In http://wiki.apache.org/solr/SpatialSearch
> > there is an example of a bbox filter and a geodist function.
> >
> > Is it possible to do a bbox filter and sort by distance - combine the
> two?
> >
> > Thanks
> > Ericz
> >
>

Re: Using terms and N-gram

2011-02-03 Thread openvictor Open

Thank you, I will do that and hopefuly it will be handy !

But can someone explain me difference between CommonGramFIlterFactory et
NGramFilterFactory ? ( Maybe the solution is there)

Thank you all,
best regards

2011/2/3 Grijesh 

>
> Use analysis.jsp to see what happening at index time and query time with
> your
> input data.You can use highlighting to see if match found.
>
> -
> Thanx:
> Grijesh
> http://lucidimagination.com
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: chaning schema

2011-02-03 Thread Gora Mohanty

On Thu, Feb 3, 2011 at 6:47 PM, Erick Erickson  wrote:
> Erik:
>
> Is this a Tomcat-specific issue? Because I regularly delete just the
> data/index directory on my Windows
> box running Jetty without any problems. (3_x and trunk)
>
> Mostly want to know because I just encouraged someone to just delete the
> index dir based on my
> experience...
>
> Thanks
> Erick
>
> On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatcher wrote:
>
>> the trick is, you have to remove the data/ directory, not just the
>> data/index subdirectory.  and of course then restart Solr.
>>
>> or delete *:*?commit=true, depending on what's the best fit for your ops.
>>
>>        Erik
>>
>> On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:
>>
>> > I tried removing the index directory once, and tomcat refused to sart up
>> because
>> > it didn't have a segments file.
[...]

I have seen this error with Tomcat, but in my experience, this has been due
to doing a "rm data/index/*" rather than "rm -rf /data/index", or due to doing
this without first shutting down Tomcat.

Regards,
Gora

Re: facet.mincount

2011-02-03 Thread Grijesh


Hi

facet.mincount not works with facet.date option afaik.
There is an issue for it as solr-343, but resolved.
Try apply patch, provided as a solution in this issue may solve the
problem.
Fix version for this may be 1.5

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2414232.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Terms and termscomponent questions

2011-02-03 Thread openvictor Open

Dear Erick,

You were totally right about the fact that I didn't use any space to
separate words, cause SolR to concatenate words !
Everything is solved now. Thank you very much for your help !

Best regards,
Victor Kabdebon

2011/2/3 Erick Erickson 

> There are a couple of things going on here. First,
> WordDelimiterFilterFactory is
> splitting things up on letter/number boundaries. Take a look at:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> for a list of *some* of the available tokenizers. You may want to just use
> one of the others, or change the parameters to
> WordDelimiterFilterFilterFactory
> to not split as it is.
>
> See the page: http://localhost:8983/solr/admin/analysis.jsp and check the
> "verbose"
> box to see what the effects of the various elements in your analysis chain
> are.
> This is a very important page for understanding the analysis part of the
> whole
> operation.
>
> Second, if you've been trying different things out, you may well have some
> old stuff in your index. When you delete documents, the terms are still in
> the index until an optimize. I'd advise starting with a clean slate for
> your
> experiments each time. The cheap way to do this is stop your server and
> delete /data/index. Delete the index directory too, not just the
> contents. So it's possible your TermsComponent is returning data from
> previous
> attempts, because I sure don't see how the concatenated terms would be
> in this index given the definition you've posted.
>
> And if none of that works, well, we'll try something else ..
>
> Best
> Erick
>
> On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open  >wrote:
>
> > Dear Erick,
> >
> > Thank you for your answer, here is my fieldtype definition. I took the
> > standard one because I don't need a better one for this field
> >
> > 
> > 
> > 
> >  > words="stopwords.txt" enablePositionIncrements="true"/>
> >  > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  > protected="protwords.txt"/>
> > 
> > 
> > 
> >  > ignoreCase="true" expand="true"/>
> >  > words="stopwords.txt" enablePositionIncrements="true"/>
> >  > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  > protected="protwords.txt"/>
> > 
> > 
> >
> > Now my field :
> >
> > 
> >
> > But I have a doubt now... Do I really put a space between words or is it
> > just a coma... If I only put a coma then the whole process is going to be
> > impacted ? What I don't really understand is that I find the separate
> > words,
> > but also their concatenation (but again in one direction only). Let me
> > explain : if a have "man" "bear" "pig" I will find :
> > "manbearpig" "bearpig" but never pigman or anyother combination in a
> > different order.
> >
> > Thank you very much
> > Best Regards,
> > Victor
> >
> > 2011/2/1 Erick Erickson 
> >
> > > Nope, this isn't what I'd expect. There are a couple of possibilities:
> > > 1> check out what WordDelimiterFilterFactory is doing, although
> > > if you're really sending spaces that's probably not it.
> > > 2> Let's see the  and  definitions for the field
> > > in question. type="text" doesn't say anything about analysis,
> > > and that's where I'd expect you're having trouble. In particular
> > > if your analysis chain uses KeywordTokenizerFactory for instance.
> > > 3> Look at the admin/schema browse page, look at your field and
> > > see what the actual tokens are. That'll tell you what
> TermsComponents
> > > is returning, perhaps the concatenation is happening somewhere
> > > else.
> > >
> > > Bottom line: Solr will not concatenate terms like this unless you tell
> it
> > > to,
> > > so I suspect you're telling it to, you just don't realize it ...
> > >
> > > Best
> > > Erick
> > >
> > > On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open  > > >wrote:
> > >
> > > > Dear Solr users,
> > > >
> > > > I am currently using SolR and TermsComponents to make an auto suggest
> > for
> > > > my
> > > > website.
> > > >
> > > > I have a field called p_field indexed and stored with type="text" in
> > the
> > > > schema xml. Nothing out of the usual.
> > > > I feed to Solr a set of words separated by a coma and a space such as
> > > (for
> > > > two documents) :
> > > >
> > > > Document 1:
> > > > word11, word12, word13. word14
> > > >
> > > > Document 2:
> > > > word21, word22, word23. word24
> > > >
> > > >
> > > > When I use my newly designed field I get things for the prefix
> "word1"
> > :
> > > > word11, word12, word13. word14 word11word12 word11word13 etc...
> > > > Is it normal to have the concatenation of words and not only the
> words
> > > > indexed ? Did I miss something about Terms ?
> > > >
> > > > Thank you very much,
> > > > Best regards all,
> > > > Victor
> > > >
> > >
> >
>

Re: My spellchecker experiment

2011-02-03 Thread Robert Muir

On Thu, Feb 3, 2011 at 8:55 AM, Emmanuel Espina
 wrote:
> It uses fuzzy queries instead of a ngram query, and then I rank the results
> by word frequency in the text with the aid of a python script (all that is
> explained in the post). I got pretty good results (between 50% and 90%
> improvements), but slower (about double time).
>

Hi Emmanuel:

I think its great you are evaluating different techniques here, our
spelling could use some help :)

By the way: we added a new spellchecking technique that sounds quite
similar to what you describe (DirectSpellChecker),
but hopefully without the performance issues.
Its only available in trunk (http://svn.apache.org/repos/asf/lucene/dev/trunk/)

I tried to do a very rough evaluation on its jira issue:
https://issues.apache.org/jira/browse/LUCENE-2507, but nothing very
serious and as in-depth as what it looks like you did.

Anyway, if you want to play you can experiment with it either at the
lucene level (its in contrib/spellchecker) or via solr, by using
DirectSolrSpellChecker... though I think the parameters in the example
solrconfig are likely not the best :)

I have an app using this more fleshed-out config (in combination with
the new collation options), and it seems to be reasonable:

  default
  text
  solr.DirectSolrSpellChecker
  1
  2
  25 
  3
  freq
  1
  org.apache.lucene.search.spell.JaroWinklerDistance

Re: facet.mincount

2011-02-03 Thread Savvas-Andreas Moysidis

ahh..I see your point..well if that's true, then facet.missing/facet.method
are also not supported?

I'm not sure if this is the case, or the Date Faceting Parameters = Field
Value Faceting  Parameters + the extra ones.
Maybe the page author(s) can clarify.

On 3 February 2011 11:32, dan sutton  wrote:

> facet.mincount is grouped only under field faceting parameters not
> date faceting parameters
>
> On Thu, Feb 3, 2011 at 11:08 AM, Savvas-Andreas Moysidis
>  wrote:
> > Hi Dan,
> >
> > I'm probably just not able to spot this, but where does the wiki page
> > mention that the facet.mincount is not applicable on date fields?
> >
> > On 3 February 2011 10:55, Isan Fulia  wrote:
> >
> >> I am using solr1.4.1 release version
> >> I got the following error while using facet.mincount
> >> java.lang.IllegalStateException: STREAM
> >>at org.mortbay.jetty.Response.getWriter(Response.java:571)
> >>at
> >> org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158)
> >>at
> >>
> org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151)
> >>at
> >>
> org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208)
> >>at
> >>
> >>
> org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144)
> >>at
> >>
> >>
> org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95)
> >>at
> >>
> >>
> org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397)
> >>at
> >> org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
> >>at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> >>at
> >>
> >>
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373)
> >>at
> >> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
> >>at
> org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
> >>at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> >>at
> >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
> >>at
> >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
> >>at
> >>
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >>at
> >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> >>at
> >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> >>at
> >> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> >>at
> org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
> >>at
> org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
> >>at
> >> org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431)
> >>at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> >>at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> >>at
> >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
> >>at
> >>
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098)
> >>at
> >>
> >>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
> >>at
> >>
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> >>at
> >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
> >>at
> >>
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >>at
> >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> >>at
> >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
> >>at
> >> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
> >>at
> >>
> >>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
> >>at
> >>
> >>
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> >>at
> >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
> >>at org.mortbay.jetty.Server.handle(Server.java:285)
> >>at
> >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
> >>at
> >>
> >>
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
> >>at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
> >>at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
> >>at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
> >>at
> >>
> >>
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
> >>at
> >>
> >>
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
> >>
> >>
> >> On 3 February 2011 16:17,

RE: value for maxFieldLength

2011-02-03 Thread McGibbney, Lewis John

Thank you Erick

Lewis

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 03 February 2011 13:25
To: solr-user@lucene.apache.org
Subject: Re: value for maxFieldLength

This is not really vary large, Solr should handle this easily (assuming
you've given it enough memory) so I'd go with a large number, say
20M. If you start running out of memory, then you've probably given
the JVM too little memory.

But Solr should handle this without a burp.

Best
Erick

On Wed, Feb 2, 2011 at 10:20 AM, McGibbney, Lewis John <
lewis.mcgibb...@gcu.ac.uk> wrote:

> Hello list,
>
> I am aware that setting the value of maxFieldLength in solrconfig.xml too
> high may/will result in out-of-mem errors. I wish to provide content
> extraction on a number of pdf documents which are large, by large I mean
> 8-11MB (occasionally more), and I am also not sure how many terms reside in
> each field when it is indexed. My question is therefore what is a sensible
> number to set this value to in order to include the majority/all terms
> within documents of this size.
>
> Thank you
>
> Lewis
>
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education's Widening Participation Initiative of the
> Year 2009 and Herald Society's Education Initiative of the Year 2009.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>
> Winner: Times Higher Education's Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
>

Email has been scanned for viruses by Altman Technologies' email management 
service - www.altman.co.uk/emailsystems

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: DataImportHandler: no queries when using entity=something

2011-02-03 Thread Erick Erickson

Here's a magic URL, not available from the admin page that may help
debugging:

/solr/admin/dataimport.jsp

Best
Erick

On Wed, Feb 2, 2011 at 7:38 PM, Jon Drukman  wrote:

> So I'm trying to update a single entity in my index using
> DataImportHandler.
>
> http://solr:8983/solr/dataimport?command=full-import&entity=games
>
> It ends near-instantaneously without hitting the database at all,
> apparently.
>
> Status shows:
>
> 0
> 0
> 0
> 0
> 
> Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
> 
> 2011-02-02 16:24:13
> 2011-02-02 16:24:13
> 0:0:0.20
>
> The query isn't that extreme.  It returns 8771 rows in about 3 seconds.
>
> How can I debug this?
>
>

Re: Reg filter criteria on multivalued attribute

2011-02-03 Thread Erick Erickson

Hmmm, why doesn't +relationship:DEF_BY -relationship:BEL_TO
work?

Then I don't think the second part matters...

Best
Erick

On Wed, Feb 2, 2011 at 12:09 PM, bbarani  wrote:

>
> Hi,
>
> I have a question on filters on multivalued atrribute. Is there a way to
> filter a multivalue attribute based on a particular value inside that
> attribute?
>
> Consider the below example.
>
> 
> DEF_BY
> BEL_TO
> 
>
> I want to do a search which returns the result which just has only the
> relationship DEF_BY and not BEL_TO. Currently if I do a normal search for
> DEF_BY, the documens which contains DEF_BY along with other relationship is
> being returned rather I want the documents that contain only DEF_BY under
> relationship shoudl be returned. Also is there a way to make SOLR return
> the
> documents based on the number of elements in multivalue attribute? If thats
> possible I can first make SOLR return those documents and then do a filter
> against that for my search on top of the results returned.
>
> Is there a way to write a query to do this? Any pointers or help in this
> regard would be appreciated..
>
> Thanks,
> Barani
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Reg-filter-criteria-on-multivalued-attribute-tp2406904p2406904.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: geodist and spacial search

2011-02-03 Thread Erick Erickson

Further down that very page ...

Here's an example of sorting by distance ascending:

   -

   ...&q=*:*&sfield=store&pt=45.15,-93.85&sort=geodist()
asc

The key is just the &sort=geodist(), I'm pretty sure that's independent of
the bbox, but
I could be wrong.

Best
Erick

On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler wrote:

> Hi
>
> In http://wiki.apache.org/solr/SpatialSearch
> there is an example of a bbox filter and a geodist function.
>
> Is it possible to do a bbox filter and sort by distance - combine the two?
>
> Thanks
> Ericz
>

Re: value for maxFieldLength

2011-02-03 Thread Erick Erickson

This is not really vary large, Solr should handle this easily (assuming
you've given it enough memory) so I'd go with a large number, say
20M. If you start running out of memory, then you've probably given
the JVM too little memory.

But Solr should handle this without a burp.

Best
Erick

On Wed, Feb 2, 2011 at 10:20 AM, McGibbney, Lewis John <
lewis.mcgibb...@gcu.ac.uk> wrote:

> Hello list,
>
> I am aware that setting the value of maxFieldLength in solrconfig.xml too
> high may/will result in out-of-mem errors. I wish to provide content
> extraction on a number of pdf documents which are large, by large I mean
> 8-11MB (occasionally more), and I am also not sure how many terms reside in
> each field when it is indexed. My question is therefore what is a sensible
> number to set this value to in order to include the majority/all terms
> within documents of this size.
>
> Thank you
>
> Lewis
>
>
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
>
> Winner: Times Higher Education's Widening Participation Initiative of the
> Year 2009 and Herald Society's Education Initiative of the Year 2009.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
>
> Winner: Times Higher Education's Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
>
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
>

Re: Partial matches don't work (solr.NGramFilterFactory

2011-02-03 Thread Tomás Fernández Löbbe

On Wed, Feb 2, 2011 at 4:44 PM, Script Head  wrote:

> Yes, I have tried searching on text_ngrams as well and it produces no
> results.
>
> On a related note, since I have  dest="text"/> wouldn't the ngrams produced by text_ngrams field
> definition also be available within the text field?
>
No, look at:
http://wiki.apache.org/solr/SchemaXml#Copy_Fields

Solr will apply the corresponding analysis chain for each field.

Anyway, you should be
able to find the document when doing queries like "text_ngrams:hippo"
I can see you are storing the field "text_ngrams", when you search for
Hippopotamus (and find results), how do you see the field "text_ngrams" on
the returned docs? you should see the NGrams there (the same data that you
should see when using the analysis page of Solr admin)

Tomás

>
>
> 2011/2/2 Tomás Fernández Löbbe :
> > About this:
> >
> > 
> >
> > The NGrams are going to be indexed on the field "text_ngrams", not on
> > "text". For the field "text", Solr will apply the text analysis (which I
> > guess doesn't have NGrams). You have to search on the "text_ngrams"
> field,
> > something like "text_ngrams:hippo" or "text_ngrams:potamu". Are you
> > searching like this?
> >
> > Tomás
> >
> > On Wed, Feb 2, 2011 at 4:07 PM, Script Head 
> wrote:
> >
> >> Hello,
> >>
> >> I have the following definitions in my schema.xml:
> >>
> >> 
> >>
> >>
> >> >> maxGramSize="15"/>
> >>
> >>
> >>
> >>
> >> 
> >> ...
> >>  >> stored="true"/>
> >> ...
> >> 
> >>
> >> There is a document "Hippopotamus is fatter than a Platypus" indexed.
> >> When I search for "Hippopotamus" I receive the expected result. When I
> >> search for any partial such as "Hippo" or "potamu" I get nothing. I
> >> could use some guidance.
> >>
> >> Script Head
> >>
> >
>

Re: chaning schema

2011-02-03 Thread Erick Erickson

Erik:

Is this a Tomcat-specific issue? Because I regularly delete just the
data/index directory on my Windows
box running Jetty without any problems. (3_x and trunk)

Mostly want to know because I just encouraged someone to just delete the
index dir based on my
experience...

Thanks
Erick

On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatcher wrote:

> the trick is, you have to remove the data/ directory, not just the
> data/index subdirectory.  and of course then restart Solr.
>
> or delete *:*?commit=true, depending on what's the best fit for your ops.
>
>Erik
>
> On Feb 1, 2011, at 11:41 , Dennis Gearon wrote:
>
> > I tried removing the index directory once, and tomcat refused to sart up
> because
> > it didn't have a segments file.
> >
> >
> >
> >
> > - Original Message 
> > From: Erick Erickson 
> > To: solr-user@lucene.apache.org
> > Sent: Tue, February 1, 2011 5:04:51 AM
> > Subject: Re: chaning schema
> >
> > That sounds right. You can cheat and just remove /data/index
> > rather than delete *:* though (you should probably do that with the Solr
> > instance stopped)
> >
> > Make sure to remove the directory "index" as well.
> >
> > Best
> > Erick
> >
> > On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon 
> wrote:
> >
> >> Anyone got a great little script for changing a schema?
> >>
> >> i.e., after changing:
> >> database,
> >> the view in the database for data import
> >> the data-config.xml file
> >> the schema.xml file
> >>
> >> I BELIEVE that I have to run:
> >> a delete command for the whole index *:*
> >> a full import and optimize
> >>
> >> This all sound right?
> >>
> >> Dennis Gearon
> >>
> >>
> >> Signature Warning
> >> 
> >> It is always a good idea to learn from your own mistakes. It is usually
> a
> >> better
> >> idea to learn from others’ mistakes, so you do not have to make them
> >> yourself.
> >> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> >>
> >>
> >> EARTH has a Right To Life,
> >> otherwise we all die.
> >>
> >>
> >
>
>

Re: Terms and termscomponent questions

2011-02-03 Thread Erick Erickson

There are a couple of things going on here. First,
WordDelimiterFilterFactory is
splitting things up on letter/number boundaries. Take a look at:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

for a list of *some* of the available tokenizers. You may want to just use
one of the others, or change the parameters to
WordDelimiterFilterFilterFactory
to not split as it is.

See the page: http://localhost:8983/solr/admin/analysis.jsp and check the
"verbose"
box to see what the effects of the various elements in your analysis chain
are.
This is a very important page for understanding the analysis part of the
whole
operation.

Second, if you've been trying different things out, you may well have some
old stuff in your index. When you delete documents, the terms are still in
the index until an optimize. I'd advise starting with a clean slate for your
experiments each time. The cheap way to do this is stop your server and
delete /data/index. Delete the index directory too, not just the
contents. So it's possible your TermsComponent is returning data from
previous
attempts, because I sure don't see how the concatenated terms would be
in this index given the definition you've posted.

And if none of that works, well, we'll try something else ..

Best
Erick

On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open wrote:

> Dear Erick,
>
> Thank you for your answer, here is my fieldtype definition. I took the
> standard one because I don't need a better one for this field
>
> 
> 
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
>  generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
>  ignoreCase="true" expand="true"/>
>  words="stopwords.txt" enablePositionIncrements="true"/>
>  generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
>
> Now my field :
>
> 
>
> But I have a doubt now... Do I really put a space between words or is it
> just a coma... If I only put a coma then the whole process is going to be
> impacted ? What I don't really understand is that I find the separate
> words,
> but also their concatenation (but again in one direction only). Let me
> explain : if a have "man" "bear" "pig" I will find :
> "manbearpig" "bearpig" but never pigman or anyother combination in a
> different order.
>
> Thank you very much
> Best Regards,
> Victor
>
> 2011/2/1 Erick Erickson 
>
> > Nope, this isn't what I'd expect. There are a couple of possibilities:
> > 1> check out what WordDelimiterFilterFactory is doing, although
> > if you're really sending spaces that's probably not it.
> > 2> Let's see the  and  definitions for the field
> > in question. type="text" doesn't say anything about analysis,
> > and that's where I'd expect you're having trouble. In particular
> > if your analysis chain uses KeywordTokenizerFactory for instance.
> > 3> Look at the admin/schema browse page, look at your field and
> > see what the actual tokens are. That'll tell you what TermsComponents
> > is returning, perhaps the concatenation is happening somewhere
> > else.
> >
> > Bottom line: Solr will not concatenate terms like this unless you tell it
> > to,
> > so I suspect you're telling it to, you just don't realize it ...
> >
> > Best
> > Erick
> >
> > On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open  > >wrote:
> >
> > > Dear Solr users,
> > >
> > > I am currently using SolR and TermsComponents to make an auto suggest
> for
> > > my
> > > website.
> > >
> > > I have a field called p_field indexed and stored with type="text" in
> the
> > > schema xml. Nothing out of the usual.
> > > I feed to Solr a set of words separated by a coma and a space such as
> > (for
> > > two documents) :
> > >
> > > Document 1:
> > > word11, word12, word13. word14
> > >
> > > Document 2:
> > > word21, word22, word23. word24
> > >
> > >
> > > When I use my newly designed field I get things for the prefix "word1"
> :
> > > word11, word12, word13. word14 word11word12 word11word13 etc...
> > > Is it normal to have the concatenation of words and not only the words
> > > indexed ? Did I miss something about Terms ?
> > >
> > > Thank you very much,
> > > Best regards all,
> > > Victor
> > >
> >
>

Re: escaping parenthesis in search query don't work...

2011-02-03 Thread Erick Erickson

WordDelimiterFilterFactory is probably stripping out the parens. If you try
running your terms through http://localhost:8983/solr/admin/analysis.jsp
you'll see the effects of
various tokenizers and filters, be sure to check
the "verbose" checkbox.

Here's a very good place to start understanding the intention of
the various options:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

In particular, about WordDelimiterFilterFactory:
split on intra-word delimiters (all non alpha-numeric characters).

   -

   "Wi-Fi" -> "Wi", "Fi"


Best
Erick

On Tue, Feb 1, 2011 at 8:52 AM, Pierre-Yves LANDRON wrote:

>
> Hello !I've seen that in order to search term with parenthesis=2C those
> have to be=escaped as in title:\(term\).But it doesn't seem to work -
> parenthesis are=n't taken in account.here is the field type I'm using to
> index these data :class="solr.TextField" positionIncrementGap="100">
>   class="solr.WhitespaceTokenizerFactory"/>
>   class="solr.StopFilterFactory"
>  ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true" />   class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>  class="solr.LowerCaseFilterFactory"/>   
> language="French" />class="solr.RemoveDuplicatesTokenFilterFactory"/>
>  
> 
>   synonyms="synonyms.txt"
>   ignoreCase="true"
>   expand="true"/>  class="solr.StopFilterFactory"
>  words="stopwords.txt"
> ignoreCase="true" /> class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>  class="solr.LowerCaseFilterFactory"/>   
> language="French" />class="solr.RemoveDuplicatesTokenFilterFactory"/>
>  
> How can I search parenthesis within my query ?Thanks,P.
>

Re: Malformed XML with exotic characters

2011-02-03 Thread Markus Jelsma

Hi

I've seen almost all funky charsets but gothic is always trouble. I'm also 
unsure if its really a bug in Solr. It could well be the Xerces being unable 
to cope. Besides, most systems indeed don't go well with gothic. This mail 
client does, but my terminal can't find its cursor after (properly) displaying 
such text.
 
http://got.wikipedia.org/wiki/%F0%90%8C%B7%F0%90%8C%B0%F0%90%8C%BF%F0%90%8C%B1%F0%90%8C%B9%F0%90%8C%B3%F0%90%8C%B0%F0%90%8C%B1%F0%90%8C%B0%F0%90%8C%BF%F0%90%8D%82%F0%90%8C%B2%F0%90%8D%83/Haubidabaurgs

Thanks for the input.

Cheers,

On Tuesday 01 February 2011 19:59:33 Robert Muir wrote:
> Hi, it might only be a problem with your xml tools (e.g. firefox).
> the problem here is characters outside of the basic multilingual plane
> (in this case Gothic).
> XML tools typically fall apart on these portions of unicode (in lucene
> we recently reverted to a patched/hacked copy of xerces specifically
> for this reason).
> 
> If you care about characters outside of the basic multilingual plane
> actually working, unfortunately you have to start being very very very
> particular about what software you use... you can assume most
> software/setups WON'T work.
> For example, if you were to use mysql's "utf8" character set you would
> find it doesn't actually support all of UTF-8! in this case you would
> need to use the recent 'utf8mb4' or something instead, that is
> actually utf-8!
> Thats just one example of a well-used piece of software that suffers
> from issues like this, there are others.
> 
> Its for reasons like these that if support for these languages is
> important to you, I would stick with the most simple/textual methods
> for input and output: e.g. using things like CSV and JSON if you can.
> I would also fully test every component/jar in your application
> individually and once you get it working, don't ever upgrade.
> 
> In any case, if you are having problems with characters outside of the
> basic multilingual plane, and you suspect its actually a bug in Solr,
> please open a JIRA issue, especially if you can provide some way to
> reproduce it
>

Re: facet.mincount

2011-02-03 Thread dan sutton

facet.mincount is grouped only under field faceting parameters not
date faceting parameters

On Thu, Feb 3, 2011 at 11:08 AM, Savvas-Andreas Moysidis
 wrote:
> Hi Dan,
>
> I'm probably just not able to spot this, but where does the wiki page
> mention that the facet.mincount is not applicable on date fields?
>
> On 3 February 2011 10:55, Isan Fulia  wrote:
>
>> I am using solr1.4.1 release version
>> I got the following error while using facet.mincount
>> java.lang.IllegalStateException: STREAM
>>        at org.mortbay.jetty.Response.getWriter(Response.java:571)
>>        at
>> org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158)
>>        at
>> org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151)
>>        at
>> org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208)
>>        at
>>
>> org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144)
>>        at
>>
>> org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95)
>>        at
>>
>> org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397)
>>        at
>> org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>        at
>>
>> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373)
>>        at
>> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
>>        at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>        at
>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
>>        at
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
>>        at
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>        at
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>>        at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>>        at
>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>>        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
>>        at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
>>        at
>> org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>>        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>>        at
>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
>>        at
>>
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098)
>>        at
>>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
>>        at
>>
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>>        at
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>>        at
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>        at
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>>        at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>>        at
>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>>        at
>>
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>>        at
>>
>> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>>        at
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>>        at org.mortbay.jetty.Server.handle(Server.java:285)
>>        at
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>>        at
>>
>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
>>        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
>>        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
>>        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>>        at
>>
>> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>>        at
>>
>> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>>
>>
>> On 3 February 2011 16:17, dan sutton  wrote:
>>
>> > I don't think facet.mincount works with date faceting, see here:
>> >
>> > http://wiki.apache.org/solr/SimpleFacetParameters
>> >
>> > Dan
>> >
>> > On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia 
>> > wrote:
>> > > Any query followed by
>> > >
>> > >
>> >
>> &facet=on&facet.date=aUpdDt&facet.date.start=2011-01-02T08:00:00.000Z&facet.date.end=2011-02-03T08:00:00.000Z&facet.date.gap=%2B1HOUR&facet.mincount=1
>> > >
>> > > On 3 February 2011 15:14, Savvas-Andreas Moysidis <
>> > > savvas.andreas.moysi...@googlemail.com> wrote:
>>

How effective are faceted queries ?

2011-02-03 Thread csj


Hi,

I was wondering if there exists any performance characteristica for facets.
As I understand facets, they are a subqueries, that will perform certain
counts on the resultset. This mean that a facet will be evaluated on every
shard along with the main query. 

But how will the facet query evaluate? If the resultset is sorted, will the
facet query take advantage of that when evaluating? 

Example: a search is done for all document within a given range of dates by
the field createdDate. The resultset is sorted by that field. Would a facet
query then be able to use this sorting, when it counts how many documents
were created per week, or per day for that matter?

Kind regards,

Christian Sonne Jensen
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-effective-are-faceted-queries-tp2412689p2412689.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: facet.mincount

2011-02-03 Thread Grijesh


I am also not getting where in wiki its mention that facet.mincount will not
work with date faceting.

But I have checked by query its not working with me also.
Have to report a bug.

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2412660.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: facet.mincount

2011-02-03 Thread Savvas-Andreas Moysidis

Hi Dan,

I'm probably just not able to spot this, but where does the wiki page
mention that the facet.mincount is not applicable on date fields?

On 3 February 2011 10:55, Isan Fulia  wrote:

> I am using solr1.4.1 release version
> I got the following error while using facet.mincount
> java.lang.IllegalStateException: STREAM
>at org.mortbay.jetty.Response.getWriter(Response.java:571)
>at
> org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158)
>at
> org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151)
>at
> org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208)
>at
>
> org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144)
>at
>
> org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95)
>at
>
> org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397)
>at
> org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>at
>
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373)
>at
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
>at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>at
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
>at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
>at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
>at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
>at
> org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>at
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
>at
>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098)
>at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
>at
>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
>at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
>at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
>at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
>at
>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
>at
>
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
>at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
>at org.mortbay.jetty.Server.handle(Server.java:285)
>at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
>at
>
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
>at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
>at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
>at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
>at
>
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
>at
>
> org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
>
>
> On 3 February 2011 16:17, dan sutton  wrote:
>
> > I don't think facet.mincount works with date faceting, see here:
> >
> > http://wiki.apache.org/solr/SimpleFacetParameters
> >
> > Dan
> >
> > On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia 
> > wrote:
> > > Any query followed by
> > >
> > >
> >
> &facet=on&facet.date=aUpdDt&facet.date.start=2011-01-02T08:00:00.000Z&facet.date.end=2011-02-03T08:00:00.000Z&facet.date.gap=%2B1HOUR&facet.mincount=1
> > >
> > > On 3 February 2011 15:14, Savvas-Andreas Moysidis <
> > > savvas.andreas.moysi...@googlemail.com> wrote:
> > >
> > >> could you post the query you are submitting to Solr?
> > >>
> > >> On 3 February 2011 09:33, Isan Fulia 
> wrote:
> > >>
> > >> > Hi all,
> > >> > Even after making facet.mincount=1 , it is showing the results with
> > count
> > >> =
> > >> > 0.
> > >> > Does anyone kno

Re: facet.mincount

2011-02-03 Thread Isan Fulia

I am using solr1.4.1 release version
I got the following error while using facet.mincount
java.lang.IllegalStateException: STREAM
at org.mortbay.jetty.Response.getWriter(Response.java:571)
at
org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158)
at
org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151)
at
org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208)
at
org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144)
at
org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95)
at
org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397)
at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373)
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268)
at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126)
at
org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)


On 3 February 2011 16:17, dan sutton  wrote:

> I don't think facet.mincount works with date faceting, see here:
>
> http://wiki.apache.org/solr/SimpleFacetParameters
>
> Dan
>
> On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia 
> wrote:
> > Any query followed by
> >
> >
> &facet=on&facet.date=aUpdDt&facet.date.start=2011-01-02T08:00:00.000Z&facet.date.end=2011-02-03T08:00:00.000Z&facet.date.gap=%2B1HOUR&facet.mincount=1
> >
> > On 3 February 2011 15:14, Savvas-Andreas Moysidis <
> > savvas.andreas.moysi...@googlemail.com> wrote:
> >
> >> could you post the query you are submitting to Solr?
> >>
> >> On 3 February 2011 09:33, Isan Fulia  wrote:
> >>
> >> > Hi all,
> >> > Even after making facet.mincount=1 , it is showing the results with
> count
> >> =
> >> > 0.
> >> > Does anyone know why this is happening.
> >> >
> >> > --
> >> > Thanks & Regards,
> >> > Isan Fulia.
> >> >
> >>
> >
> >
> >
> > --
> > Thanks & Regards,
> > Isan Fulia.
> >
>



-- 
Thanks & Regards,
Isan Fulia.

Re: facet.mincount

2011-02-03 Thread dan sutton

I don't think facet.mincount works with date faceting, see here:

http://wiki.apache.org/solr/SimpleFacetParameters

Dan

On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia  wrote:
> Any query followed by
>
> &facet=on&facet.date=aUpdDt&facet.date.start=2011-01-02T08:00:00.000Z&facet.date.end=2011-02-03T08:00:00.000Z&facet.date.gap=%2B1HOUR&facet.mincount=1
>
> On 3 February 2011 15:14, Savvas-Andreas Moysidis <
> savvas.andreas.moysi...@googlemail.com> wrote:
>
>> could you post the query you are submitting to Solr?
>>
>> On 3 February 2011 09:33, Isan Fulia  wrote:
>>
>> > Hi all,
>> > Even after making facet.mincount=1 , it is showing the results with count
>> =
>> > 0.
>> > Does anyone know why this is happening.
>> >
>> > --
>> > Thanks & Regards,
>> > Isan Fulia.
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Isan Fulia.
>

Re: from long to tlong, compatible?

2011-02-03 Thread Dan G

Thanks for the fast answer.
Yeah, I was afraid that I needed to re-index for the precision to take effect 
in 
this case.


- Original Message 
From: Yonik Seeley 
To: solr-user@lucene.apache.org
Sent: Wed, February 2, 2011 10:12:42 PM
Subject: Re: from long to tlong, compatible?

On Wed, Feb 2, 2011 at 3:46 PM, Dan G  wrote:

> My question is if it would be possible to just change the field to the
> preferred
> type "tlong" with a precision of "8"?
>
> Would this change be compatible with my indexed data or should I re-indexed
> the
> date (a pain with 800+M docs :))?
>

I think you'll need to re-index, or range queries on that field will miss
many of the documents you've already indexed with precisionStep=0

-Yonik
http://lucidimagination.com

Re: Open Too Many Files

2011-02-03 Thread Grijesh


best option to use 
true 

decreasing mergeFactor may cause indexing slow

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Open-Too-Many-Files-tp2406289p2412415.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: facet.mincount

2011-02-03 Thread Grijesh


Have you seen your log file ,what saying the log file . Is there any
exception occur?
I have never seen that facet.mincont=1 not working.
What version of solr you are using?

-
Thanx:
Grijesh
http://lucidimagination.com
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2412389.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: facet.mincount

2011-02-03 Thread Isan Fulia

Any query followed by

&facet=on&facet.date=aUpdDt&facet.date.start=2011-01-02T08:00:00.000Z&facet.date.end=2011-02-03T08:00:00.000Z&facet.date.gap=%2B1HOUR&facet.mincount=1

On 3 February 2011 15:14, Savvas-Andreas Moysidis <
savvas.andreas.moysi...@googlemail.com> wrote:

> could you post the query you are submitting to Solr?
>
> On 3 February 2011 09:33, Isan Fulia  wrote:
>
> > Hi all,
> > Even after making facet.mincount=1 , it is showing the results with count
> =
> > 0.
> > Does anyone know why this is happening.
> >
> > --
> > Thanks & Regards,
> > Isan Fulia.
> >
>



-- 
Thanks & Regards,
Isan Fulia.

Re: Open Too Many Files

2011-02-03 Thread Markus Jelsma

Or decrease the mergeFactor.

> or change the index to a compound-index
> 
> solrconfig.xml: true
> 
> so solr creates one index file and not thousands.
> 
> -
> --- System
> 
> 
> One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
> 1 Core with 31 Million Documents other Cores < 100.000
> 
> - Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
> - Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx

Re: DataImportHandler: no queries when using entity=something

2011-02-03 Thread Gora Mohanty

On Thu, Feb 3, 2011 at 3:23 PM, Darx Oman  wrote:
> add to url "&clean=false"
> http://solr:8983/solr/dataimport?command=full-import&entity=games&;
> clean=false
>
> *clean* : (default 'true'). Tells whether to clean up the index before the
> indexing is started
[...]

Sorry, what does that have to do with the original poster's question?

Regards,
Gora

Re: DataImportHandler: no queries when using entity=something

2011-02-03 Thread Darx Oman

check your log file you might have a connection problem

Re: DataImportHandler: no queries when using entity=something

2011-02-03 Thread Darx Oman

add to url "&clean=false"
http://solr:8983/solr/dataimport?command=full-import&entity=games&;
clean=false

*clean* : (default 'true'). Tells whether to clean up the index before the
indexing is started


>

Re: facet.mincount

2011-02-03 Thread Savvas-Andreas Moysidis

could you post the query you are submitting to Solr?

On 3 February 2011 09:33, Isan Fulia  wrote:

> Hi all,
> Even after making facet.mincount=1 , it is showing the results with count =
> 0.
> Does anyone know why this is happening.
>
> --
> Thanks & Regards,
> Isan Fulia.
>

facet.mincount

2011-02-03 Thread Isan Fulia

Hi all,
Even after making facet.mincount=1 , it is showing the results with count =
0.
Does anyone know why this is happening.

-- 
Thanks & Regards,
Isan Fulia.

Re: Open Too Many Files

2011-02-03 Thread stockii


or change the index to a compound-index

solrconfig.xml: true

so solr creates one index file and not thousands. 

-
--- System


One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 
1 Core with 31 Million Documents other Cores < 100.000

- Solr1 for Search-Requests - commit every Minute  - 4GB Xmx
- Solr2 for Update-Request  - delta every 2 Minutes - 4GB Xmx
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Open-Too-Many-Files-tp2406289p2411736.html
Sent from the Solr - User mailing list archive at Nabble.com.

79 matches

Mail list logo