using q= , adding fq=

2009-12-11 Thread Fer-Bj

We're running a 14M documents index. For each document we have:

   
   
   
   
   
(and a few other fields).

Our most usual query is something like this:
q=cat_id:xxx AND geo_id:&sort=id desc   where cat_id = which "category"
(cars,sports,toys,etc) the item belongs to, and geo_id = which city/district
the item belongs to.
So this query will return a list of documents posted in category xxx, region
yyy. 
Sorted by ID DESC, to get the newest first.

There are 2 questions I'd like to ask:

1) adding something like:  q=cat_id:xxx&fq=geo_id= would boost
performance?

2) we do find problems when we ask for a page=large offset!  ie: 
q=cat_id:xxx and geo_id:yyy&start=544545
(note that we limit docs to 50 max per resultset).
When start is 500 or more, Qtime is >=5 seconds while the avg qtime is
<100 ms

Any help or tips would be appreciated!

Thanks,



-- 
View this message in context: 
http://old.nabble.com/using-q%3D--%2C-adding-fq%3D-tp26753938p26753938.html
Sent from the Solr - User mailing list archive at Nabble.com.



SOLR 1.4: how to configure the improved chinese analyzer?

2009-12-09 Thread Fer-Bj

Hello,

 is there any existing FAQ or HowTo on how to setup the improved (and new?)
chinese analyzer on Solr 1.4?

I'd appreciate any help you may provide on this.

Thanks,
-- 
View this message in context: 
http://old.nabble.com/SOLR-1.4%3A-how-to-configure-the-improved-chinese-analyzer--tp26706709p26706709.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field Compression

2009-06-04 Thread Fer-Bj

Here is what we have:

for all the documents we have a field called "small_body" , which is a 60
chars max text field that were we store the "abstract" for each article.

We have about 8,000,000 documents indexed, and usually we display this
small_body on our "listing pages". 

For each listing page we load 50 documents at the time, that is to say, we
need to display this small_body we want to compress every time.

I'll probably do the compress for this field and run a 1 week test to see
the outcome, roll it back eventually.

Last question: what's the best way to determine the compress threshold ?

Grant Ingersoll-6 wrote:
> 
> 
> On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote:
> 
>>
>> It *will* cause performance issues if you load that field for a large
>> number of documents on a particular search. I know Lucene itself
>> has lazy field loading that helps in this case, but I don't know how
>> to persuade SOLR to use it (it may even lazy-load automatically).
>> But this is separate from searching...
> 
> Lazy loading is an option configured in the solrconfig.xml
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Field-Compression-tp15258669p23879859.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: indexing Chienese langage

2009-06-04 Thread Fer-Bj

What we usually do to reindex is:

1. stop solr
2. rmdir -r data  (that is to remove everything in  /opt/solr/data/
3. mkdir data
4. start solr
5. start reindex.   with this we're sure about not having old copies or
index..

To check the index size we do:
cd data
du -sh



Otis Gospodnetic wrote:
> 
> 
> I can't tell what that analyzer does, but I'm guessing it uses n-grams?
> Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629
> instead?
> 
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: Fer-Bj 
>> To: solr-user@lucene.apache.org
>> Sent: Thursday, June 4, 2009 2:20:03 AM
>> Subject: Re: indexing Chienese langage
>> 
>> 
>> We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after
>> reindexing
>> the index size went from 1.5 Gb to 2.7 Gb.
>> 
>> Is that some expected behavior ?
>> 
>> Is there any switch or trick to avoid having a double + index file size?
>> 
>> Koji Sekiguchi-2 wrote:
>> > 
>> > CharFilter can normalize (convert) traditional chinese to simplified 
>> > chinese or vice versa,
>> > if you define mapping.txt. Here is the sample of Chinese character 
>> > normalization:
>> > 
>> > 
>> https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
>> > 
>> > See SOLR-822 for the detail:
>> > 
>> > https://issues.apache.org/jira/browse/SOLR-822
>> > 
>> > Koji
>> > 
>> > 
>> > revathy arun wrote:
>> >> Hi,
>> >>
>> >> When I index chinese content using chinese tokenizer and analyzer in
>> solr
>> >> 1.3 ,some of the chinese text files are getting indexed but others are
>> >> not.
>> >>
>> >> Since chinese has got many different language subtypes as in standard
>> >> chinese,simplified chinese etc which of these does the chinese
>> tokenizer
>> >> support and is there any method to find the type of  chiense language 
>> >> from
>> >> the file?
>> >>
>> >> Rgds
>> >>
>> >>  
>> > 
>> > 
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/indexing-Chienese-langage-tp22033302p23879730.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field Compression

2009-06-04 Thread Fer-Bj

Is it correct to assume that using field compression will cause performance
issues if we decide to allow search over this field?

ie:

  
 
   
 

if I decide to add "compressed=true"  to the BODY field... and a I allow
search on body... would that be a problem?
At the same time: if I add compressed=true , but I never do search on this
field ?
  

Stu Hood-3 wrote:
> 
> I just finished watching this talk about a column-store RDBMS, which has a
> long section on column compression. Specifically, it talks about the gains
> from compressing similar data together, and how lazily decompressing data
> only when it must be processed is great for memory/CPU cache usage.
> 
> http://youtube.com/watch?v=yrLd-3lnZ58
> 
> While interesting, its not relevant to Lucene's stored field storage. On
> the other hand, it did get me thinking about stored field compression and
> lazy field loading.
> 
> Can anyone give me some pointers about compressThreshold values that would
> be worth experimenting with? Our stored fields are often between 20 and
> 300 characters, and we're willing to spend more time indexing if it will
> make searching less IO bound.
> 
> Thanks,
> 
> Stu Hood
> Architecture Software Developer
> Mailtrust, a Rackspace Company
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Field-Compression-tp15258669p23865558.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: indexing Chienese langage

2009-06-03 Thread Fer-Bj

We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing
the index size went from 1.5 Gb to 2.7 Gb.

Is that some expected behavior ?

Is there any switch or trick to avoid having a double + index file size?

Koji Sekiguchi-2 wrote:
> 
> CharFilter can normalize (convert) traditional chinese to simplified 
> chinese or vice versa,
> if you define mapping.txt. Here is the sample of Chinese character 
> normalization:
> 
> https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
> 
> See SOLR-822 for the detail:
> 
> https://issues.apache.org/jira/browse/SOLR-822
> 
> Koji
> 
> 
> revathy arun wrote:
>> Hi,
>>
>> When I index chinese content using chinese tokenizer and analyzer in solr
>> 1.3 ,some of the chinese text files are getting indexed but others are
>> not.
>>
>> Since chinese has got many different language subtypes as in standard
>> chinese,simplified chinese etc which of these does the chinese tokenizer
>> support and is there any method to find the type of  chiense language 
>> from
>> the file?
>>
>> Rgds
>>
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Chinese / How to ?

2009-06-02 Thread Fer-Bj

Right now we figured out the insert new documents problem, which was by
removing "special" ascii chars not accepted for XML on SOLR 1.3

The question is now: how to config SOLR 1.3 with the chinese support!

James liu-2 wrote:
> 
> u means how to config solr which support chinese?
> 
> Update problem?
> 
> On Tuesday, June 2, 2009, Fer-Bj  wrote:
>>
>> I'm sending 3 files:
>> - schema.xml
>> - solrconfig.xml
>> - error.txt (with the error description)
>>
>> I can confirm by now that this error is due to invalid characters for the
>> XML format (ASCII 0 or 11).
>> However, this problem now is taking a different direction: how to start
>> using the CJK instead of the english!
>> http://www.nabble.com/file/p23825881/error.txt error.txt
>> http://www.nabble.com/file/p23825881/solrconfig.xml solrconfig.xml
>> http://www.nabble.com/file/p23825881/schema.xml schema.xml
>>
>>
>> Grant Ingersoll-6 wrote:
>>>
>>> Can you provide details on the errors?  I don't think we have a
>>> specific how to, but I wouldn't think it would be much different from
>>> 1.2
>>>
>>> -Grant
>>> On May 31, 2009, at 10:31 PM, Fer-Bj wrote:
>>>
>>>>
>>>> Hello,
>>>>
>>>> is there any "how to" already created to get me up using SOLR 1.3
>>>> running
>>>> for a chinese based website?
>>>> Currently our site is using SOLR 1.2, and we tried to move into 1.3
>>>> but we
>>>> couldn't complete our reindex as it seems like 1.3 is more strict
>>>> when it
>>>> comes to special chars.
>>>>
>>>> I would appreciate any help anyone may provide on this.
>>>>
>>>> Thanks!!
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Using-Chinese---How-to---tp23810129p23810129.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>
>>> --
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>> using Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Using-Chinese---How-to---tp23810129p23825881.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> -- 
> regards
> j.L ( I live in Shanghai, China)
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-Chinese---How-to---tp23810129p23844708.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using Chinese / How to ?

2009-06-01 Thread Fer-Bj

I'm sending 3 files:
- schema.xml
- solrconfig.xml
- error.txt (with the error description) 

I can confirm by now that this error is due to invalid characters for the
XML format (ASCII 0 or 11).
However, this problem now is taking a different direction: how to start
using the CJK instead of the english!
http://www.nabble.com/file/p23825881/error.txt error.txt 
http://www.nabble.com/file/p23825881/solrconfig.xml solrconfig.xml 
http://www.nabble.com/file/p23825881/schema.xml schema.xml 


Grant Ingersoll-6 wrote:
> 
> Can you provide details on the errors?  I don't think we have a  
> specific how to, but I wouldn't think it would be much different from  
> 1.2
> 
> -Grant
> On May 31, 2009, at 10:31 PM, Fer-Bj wrote:
> 
>>
>> Hello,
>>
>> is there any "how to" already created to get me up using SOLR 1.3  
>> running
>> for a chinese based website?
>> Currently our site is using SOLR 1.2, and we tried to move into 1.3  
>> but we
>> couldn't complete our reindex as it seems like 1.3 is more strict  
>> when it
>> comes to special chars.
>>
>> I would appreciate any help anyone may provide on this.
>>
>> Thanks!!
>> -- 
>> View this message in context:
>> http://www.nabble.com/Using-Chinese---How-to---tp23810129p23810129.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-Chinese---How-to---tp23810129p23825881.html
Sent from the Solr - User mailing list archive at Nabble.com.



Using Chinese / How to ?

2009-05-31 Thread Fer-Bj

Hello,

 is there any "how to" already created to get me up using SOLR 1.3 running
for a chinese based website?
 Currently our site is using SOLR 1.2, and we tried to move into 1.3 but we
couldn't complete our reindex as it seems like 1.3 is more strict when it
comes to special chars.

I would appreciate any help anyone may provide on this. 

Thanks!!
-- 
View this message in context: 
http://www.nabble.com/Using-Chinese---How-to---tp23810129p23810129.html
Sent from the Solr - User mailing list archive at Nabble.com.