RE: *Very* slow Commit after upgrading to solr 1.3

2008-10-07 Thread Ben Shlomo, Yatir
So other than me doing trial & error, do you have any guidance on how to
configure the merge factor (and ramBufferSizeMB ? ).
any "formula" that supplies the optimal value ?
Thanks,
Yatir

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, October 07, 2008 1:10 PM
To: solr-user@lucene.apache.org
Subject: Re: *Very* slow Commit after upgrading to solr 1.3

On Tue, Oct 7, 2008 at 6:32 AM, Ben Shlomo, Yatir
<[EMAIL PROTECTED]> wrote:
> The problem is solved, see below.
> Since the performance is so sensitive to configuration - do you have a
> tip on how to determine the optimal configuration for
> mergeFactor, ramBufferSizeMB and other properties ?

The issue might have been your high merge factor coupled with changes
in how Lucene closes an index.  To prevent possible corruption on a
crash, Lucene now does an fsync on the index files before it writes
the new segment descriptor that references those files.  A high merge
factor means more segments, hence more segment files to sync on a
close.

-Yonik


> My original problem occurred even on a fresh rebuild of the index with
> solr 1.3
> To solve it I used the entire IndexWriter section settings from the
solr
> 1.3 example file
> This had a dramatic impact:
> I indexed 20 GB of data (52M docs)
> The total indexing time was 13 hours
> The index size was 30 GB
> The total commit time was less than 2 minutes
>
> Tomcat Log for reference
>
> Oct 5, 2008 9:43:24 PM org.apache.solr.update.DirectUpdateHandler2
> commit
> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening [EMAIL PROTECTED] main
> Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2
> commit
> INFO: end_commit_flush
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
>
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
>
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
> 0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
>
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
>
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
> 0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
>
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
>
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
> atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
>
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
>
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
> atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
>
>
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
>
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
> o=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
> INFO: autowarming result for [EMAIL PROTECTED] main
>
>
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
>
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
> o=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore registerSearcher
> INFO: [] Registered new searcher [EMAIL PROTECTED] main
> Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher close
> INFO: Closing [EMAIL PROTECTED] main
>
>
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
>
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
> 0.00,cumulative_inserts=0,cumulative_evictions=0}
>
>
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
>
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
> atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
>
>
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
>
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
> o=0.00,cumulative_inserts=0,cumulative_evictions=0}
> Oct 5, 2008 9:

RE: *Very* slow Commit after upgrading to solr 1.3

2008-10-07 Thread Ben Shlomo, Yatir
 take a
long time?

I notice your merge factor is 1000... this will create many files that
need to be sync'd
It may help to try the IndexWriter settings from the 1.3 example
setup... the important changes being:

10

32

-Yonik

On Mon, Sep 29, 2008 at 5:33 AM, Ben Shlomo, Yatir
<[EMAIL PROTECTED]> wrote:
> Hi!
>
>
>
> I am running on widows 64 bit ...
> I have upgraded to solr 1.3 in order to use the distributed search.
>
> I haven't changed the solrConfig and the schema xml files during the
> upgrade.
>
> I am indexing ~ 350K documents (each one is about 0.5 KB in size)
>
> The indexing takes a reasonable amount of time (350 seconds)
>
> See tomcat log:
>
> INFO: {add=[8x-wbTscWftuu1sVWpdnGw==, VOu1eSv0obBl1xkj2jGjIA==,
> YkOm-nKPrTVVVyeCZM4-4A==, rvaq_TyYsqt3aBc0KKDVbQ==,
> 9NdzWXsErbF_5btyT1JUjw==, ...(398728 more)]} 0 349875
>
>
>
> But when I commit it takes more than an hour ! (5000 seconds!, the
> optimize after the commit took 14 seconds)
>
> INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
>
>
>
> p.s. its not a machine problem I moved to another machine and the same
> thing happened
>
>
> I noticed something very strange during the time I wait for the
commit:
>
> While the solr index is 210MB in size
>
> In the windows task manager I noticed that the java process is making
a
> HUGE amounts of IO reads:
>
> It reads more than 350 GB ! (- which takes a lot of time.)
>
> The process is constantly taking 25% of the cpu resources.
>
> All my autowarmCount in Solrconfig  file do not exceed 256...
>
>
>
> Any more ideas to check?
>
> Thanks.
>
>
>
>
>
>
>
> Here is part of my solrConfig file:
>
> -   < -

>
> - 
>
>  false
>
>  1000
>
>  1000
>
>  2147483647
>
>  1
>
>  1000
>
>  1
>
>  
>
> - 
>
> - 
>
>  false
>
>  1000
>
>  1000
>
>  2147483647
>
>  1
>
> - 
>
>  true
>
>  
>
>
>
>
>
>
>
>
>
>
>
> Yatir Ben-shlomo | eBay, Inc. | Classification Track, Shopping.com
> (Israel) | w: +972-9-892-1373 |  email: [EMAIL PROTECTED] |
>
>
>
>


*Very* slow Commit after upgrading to solr 1.3

2008-09-29 Thread Ben Shlomo, Yatir
Hi!

 

I am running on widows 64 bit ...
I have upgraded to solr 1.3 in order to use the distributed search.

I haven't changed the solrConfig and the schema xml files during the
upgrade.

I am indexing ~ 350K documents (each one is about 0.5 KB in size)

The indexing takes a reasonable amount of time (350 seconds)

See tomcat log:

INFO: {add=[8x-wbTscWftuu1sVWpdnGw==, VOu1eSv0obBl1xkj2jGjIA==,
YkOm-nKPrTVVVyeCZM4-4A==, rvaq_TyYsqt3aBc0KKDVbQ==,
9NdzWXsErbF_5btyT1JUjw==, ...(398728 more)]} 0 349875

 

But when I commit it takes more than an hour ! (5000 seconds!, the
optimize after the commit took 14 seconds)

INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)

 

p.s. its not a machine problem I moved to another machine and the same
thing happened


I noticed something very strange during the time I wait for the commit:

While the solr index is 210MB in size

In the windows task manager I noticed that the java process is making a
HUGE amounts of IO reads:

It reads more than 350 GB ! (- which takes a lot of time.)

The process is constantly taking 25% of the cpu resources.

All my autowarmCount in Solrconfig  file do not exceed 256...

 

Any more ideas to check?

Thanks.

 

 

 

Here is part of my solrConfig file:

-   < - 

-  

  false 

  1000 

  1000 

  2147483647 

  1 

  1000 

  1 

  

- 

-  

  false 

  1000 

  1000 

  2147483647 

  1 

-  

  true 

  

 

 

 

 

 

Yatir Ben-shlomo | eBay, Inc. | Classification Track, Shopping.com
(Israel) | w: +972-9-892-1373 |  email: [EMAIL PROTECTED] |

 



RE: help required: how to design a large scale solr system

2008-09-24 Thread Ben Shlomo, Yatir
Thanks Mark!.
Do you have any comment regarding the performance differences between
indexing TSV files as opposed to directly indexing each document via
http post?

-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 24, 2008 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: help required: how to design a large scale solr system


 From my limited experience:

I think you might have a bit of trouble getting 60 mil docs on a single 
machine. Cached queries will probably still be *very* fast, but non 
cached queries are going to be very slow in many cases. Is that 5 
seconds for all queries? You will never meet that on first run queries 
with 60mil docs on that machine. The light query load might make things 
workable...but your near the limits of a single machine (4 core or not) 
with 60 mil. You want to use a very good stopword list...common term 
queries will be killer. The docs being so small will be your only 
possible savior if you go the one machine route - that and cached hits. 
You don't have enough ram to get as much of the filesystem into RAM as 
youd like for 60 mil docs either.

I think you might try two machines with 30, 3 with 20, or 4 with 15. The

more you spread, even with slower machines, the faster your likely to 
index, which as you say, will take a long time for 60 mil docs (start 
today ). Multiple machines will help the indexing speed the most for 
sure - its still going to take a long time.

I don't think you will get much advantage using more than one solr 
install on a single machine - if you do, that should be addressed in the

code, even with RAID.

So I say, spread if you can. Faster indexing, faster search, easy to 
expand later. Distributed search is so easy with solr 1.3, you wont 
regret it. I think there is a bug to be addressed if your needing this 
in a week though - in my experience, with distributed search, for every 
million docs on a machine beyond the first, you lose a doc in a search 
across all machines (ie 1 mil on machine 1, 1 million on machine 2, a 
*:* search will be missing 1 doc. 10 mil each on 3 machines, a *:* 
search will be missing 30. Not a big deal, but could be a concern for 
some with picky, look at everything customers.

- Mark

Ben Shlomo, Yatir wrote:
> Hi!
>
> I am already using solr 1.2 and happy with it.
>
> In a new project with very tight dead line (10 development days from
> today) I need to setup a more ambitious system in terms of scale
> Here is the spec:
>
>  
>
> * I need to index about 60,000,000
> documents 
>
> * Each document is has 11 textual fields to be indexed &
stored
> and 4 more fields to be stored only 
>
> * Most fields are short (2-14 characters) however 2 indexed
> fields can be up to 1KB and another stored field is up to 1KB 
>
> * On average every document is about 0.5 KB to be stored and
> 0.4KB to be indexed 
>
> * The SLA for data freshness is a full nightly re-index ( I
> cannot obtain an incremental update/delete lists of the modified
> documents) 
>
> * The SLA for query time is 5 seconds 
>
> * the number of expected queries is 2-3 queries per second 
>
> * the queries are simple a combination of Boolean operation
and
> name searches (no fancy fuzzy searches and levinstien distances, no
> faceting, etc) 
>
> * I have a 64 bit Dell 2950 4-cpu machine  (2 dual cores )
with
> RAID 10, 200 GB HD space, and 8GB RAM memory 
>
> * The documents are not given to me explicitly - I am given a
> raw-documents in RAM - one by one, from which I create my document in
> RAM.
> and then I can either http-post is to index it directly or append it
to
> a tsv file for later indexing 
>
> * Each document has a unique ID
>
>  
>
> I have a few directions I am thinking about
>
>  
>
> The simple approach
>
> * Have one solr instance that will
index
> the entire document set (from files). I am afraid this will take too
> much time
>
>  
>
> Direction 1
>
> * Create TSV files from all the
> documents - this will take around 3-4 hours 
>
> * Have all the documents partitioned
> into several subsets (how many should I choose? ) 
>
> * Have multiple solr instances on the
> same machine 
>
> * Let each solr instance concurrently
> index the appropriate subset 
>
> * At the end merge all the indices
using
> the IndexMergeTool - (how much time will it take ?)
>
>  
>
> Direction 2
>
> * Like  the previous but

help required: how to design a large scale solr system

2008-09-23 Thread Ben Shlomo, Yatir
Hi!

I am already using solr 1.2 and happy with it.

In a new project with very tight dead line (10 development days from
today) I need to setup a more ambitious system in terms of scale
Here is the spec:

 

* I need to index about 60,000,000
documents 

* Each document is has 11 textual fields to be indexed & stored
and 4 more fields to be stored only 

* Most fields are short (2-14 characters) however 2 indexed
fields can be up to 1KB and another stored field is up to 1KB 

* On average every document is about 0.5 KB to be stored and
0.4KB to be indexed 

* The SLA for data freshness is a full nightly re-index ( I
cannot obtain an incremental update/delete lists of the modified
documents) 

* The SLA for query time is 5 seconds 

* the number of expected queries is 2-3 queries per second 

* the queries are simple a combination of Boolean operation and
name searches (no fancy fuzzy searches and levinstien distances, no
faceting, etc) 

* I have a 64 bit Dell 2950 4-cpu machine  (2 dual cores ) with
RAID 10, 200 GB HD space, and 8GB RAM memory 

* The documents are not given to me explicitly - I am given a
raw-documents in RAM - one by one, from which I create my document in
RAM.
and then I can either http-post is to index it directly or append it to
a tsv file for later indexing 

* Each document has a unique ID

 

I have a few directions I am thinking about

 

The simple approach

* Have one solr instance that will index
the entire document set (from files). I am afraid this will take too
much time

 

Direction 1

* Create TSV files from all the
documents - this will take around 3-4 hours 

* Have all the documents partitioned
into several subsets (how many should I choose? ) 

* Have multiple solr instances on the
same machine 

* Let each solr instance concurrently
index the appropriate subset 

* At the end merge all the indices using
the IndexMergeTool - (how much time will it take ?)

 

Direction 2

* Like  the previous but instead of
using the IndexMergeTool , use distributed search with shards (upgrading
to solr 1.3)

 

Direction 3,4

* Like previous directions only avoid
using TSV files at all and directly index the documents from RAM

Questions:

* Which direction do you recommend in order to meet the SLAs in
the fastest way? 

* Since I have RAID on the machine can I gain performance by
using multiple solr instances on the same machine or only multiple
machines will help me 

* What's the minimal number of machines I should require (I
might get more weaker machines) 

* How many concurrent indexers are recommended? 

* Do you agree that the bottle neck is the indexing time?

Any help is appreciated 

Thanks in advance

yatir

 



RE: solr not finding all results

2007-10-15 Thread Ben Shlomo, Yatir
Did you try to add a backslash to escape the "-" in Geckoplp4-M
(Geckoplp4\-M)


-Original Message-
From: Kevin Lewandowski [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 12, 2007 9:40 PM
To: solr-user@lucene.apache.org
Subject: solr not finding all results

I've found an odd situation where solr is not returning all of the
documents that I think it should. A search for "Geckoplp4-M" returns 3
documents but I know that there are at least 100 documents with that
string.

Here is an example query for that phrase and the result set:
http://localhost:9020/solr/select/?q=Geckoplp4-M&version=2.2&start=0&row
s=10&indent=on&fl=comments,id



 0
 0
 
  10
  0
  on
  comments,id
  Geckoplp4-M
  2.2
 


 
  Geckoplp4-M
  m2816500
 
 
  toptrax recordings. Same tracks.
Geckoplp4-M
  m2816544
 
 
  Geckoplp4-M
  m2815903
 



Now here's an example of a search for two documents that I know have
that string, but were not returned in the previous search:
http://localhost:9020/solr/select/?q=id%3Am2816615+OR+id%3Am2816611&vers
ion=2.2&start=0&rows=10&indent=on&fl=id,comments



 0
 1
 
  10
  0
  on
  id,comments
  id:m2816615 OR id:m2816611
  2.2
 


 
  Geckoplp4-M
  m2816611
 
 
  Geckoplp4-M
  m2816615
 



Here is the definition for the "comments" field:


And here is the definition for a "text" field:

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


Any ideas? Am I doing something wrong?

thanks,
Kevin


I can't delete, why?

2007-09-25 Thread Ben Shlomo, Yatir
Hi!
I know I can delete multiple docs with the following:
mediaId:(6720 OR 6721 OR  )

My question is can I do something like this?
languageId:123 AND manufacturer:456 
(It does not work for me and I didn't forget to commit)


How can I do it ? with copy field ?
languageIdmanufacturer:123456
Thanks
yatir


solved: quering UTF-8 encoded CSV files

2007-08-21 Thread Ben Shlomo, Yatir
My problem is resolved:

The problem happened on tomcat running on win xp

When indexing utf-encoded csv files

 

The conclusion is that setting URIEncoding="UTF-8" in the  section 
in server.xml is not enough

I also needed to add -Dfile.encoding=UTF-8 to the tomcat’s java startup options 
(in catalina.bat)

yatir

____

From: Ben Shlomo, Yatir [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 20, 2007 6:40 PM
To: solr-user@lucene.apache.org
Subject: problem with quering solr after indexing UTF-8 encoded CSV files

 

Hi!

 

I have utf-8 encoded data inside a csv file (actually it’s a tab separated file 
- attached)

I can index it with no apparent errors

I did not forget to set this in my tomcat configuration

 

 


 
   

 

When I query  a document using the UTF-8 text I get zero matches: 

 

   

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8&version=2.2&start=0&rows=10&indent=on##>
  

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8&version=2.2&start=0&rows=10&indent=on##>
  

  0 

  0 

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8&version=2.2&start=0&rows=10&indent=on##>
  

  on 

  0 

יתיר // Note that - I can see the correct UTF-8 
text in it (hebrew characters)

  10 

  2.2 

  

  

   

  

 

 

When I observe this text in the response by querinig for *:*

I notice that the text does not appear as desired: יתיר instead of יתיר

Do you have any ideas?

Thanks…

 

Here is the response :

 

   

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
  

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
  

  0 

  0 

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
  

  on 

  0 

  *:* 

  10 

  2.2 

  

  

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
  

- 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
  

  1 

  desc is a very good camera 

  display is יתיר ABC res123  

  1 

  1 

  ABC 

   res123  

  C123 

  123456 

  72900010123 

  

  

  

 

 

yatir



problem with quering solr after indexing UTF-8 encoded CSV files

2007-08-20 Thread Ben Shlomo, Yatir
Hi!

 

I have utf-8 encoded data inside a csv file (actually it’s a tab separated file 
- attached)

I can index it with no apparent errors

I did not forget to set this in my tomcat configuration

 

 


 
   

 

When I query  a document using the UTF-8 text I get zero matches: 

 

   

- 

  

- 

  

  0 

  0 

- 

  

  on 

  0 

יתיר // Note that - I can see the correct UTF-8 
text in it (hebrew characters)

  10 

  2.2 

  

  

   

  

 

 

When I observe this text in the response by querinig for *:*

I notice that the text does not appear as desired: יתיר instead of יתיר

Do you have any ideas?

Thanks…

 

Here is the response :

 

   

- 

  

- 

  

  0 

  0 

- 

  

  on 

  0 

  *:* 

  10 

  2.2 

  

  

- 

  

- 

  

  1 

  desc is a very good camera 

  display is יתיר ABC res123  

  1 

  1 

  ABC 

   res123  

  C123 

  123456 

  72900010123 

  

  

  

 

 

yatir



RE: question: how to divide the indexing into sperate domains

2007-08-11 Thread Ben Shlomo, Yatir
Thanks yonik!

I do have some unused fields inside the csv file.
But they are not empty.
They are numeric they can be anything between 0 to 10,000
Can I do something like
f.unused.map=*:98765 

yatir

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, August 09, 2007 10:41 PM
To: solr-user@lucene.apache.org
Subject: Re: question: how to divide the indexing into sperate domains

Hmmm, I think you can map an empty (zero length) value to something else
via
f.foo.map=:something
But that column does currently need to be there in the CSV.

Specifying default values in a per-request basis is interesting, and
something we could perhaps support in the future.
The quickest way to index your data right now would probably be to
change the file, adding another value at the end of each file.  I
think it could even be an empty value (just add a "," at the end of
each line), and then you could map that via
f.domain.map=:98765

btw, 300M records is a lot for one Solr instance... I hope you've got
a big box with a lot of memory, and aren't too concerned with your
query latency.  Otherwise you can do some partitioning by domain.

-Yonik

On 8/9/07, Ben Shlomo, Yatir <[EMAIL PROTECTED]> wrote:
> Hi!
>
> say I have 300 csv files that I need to index.
>
> Each one holds millions of lines (each line is a few fields separated
by
> commas)
>
> Each csv file represents a different domain of data (e,g, file1 is
> computers, file2 is flowers, etc)
>
> There is no indication of the domain ID in the data inside the csv
file
>
>
>
> When I search I would like to specify the id of a specific domain
>
> And I want solr to search only in this domain - to save time and
reduce
> the number of matches
>
> I need to specify during indexing - the domain id of the csv file
being
> indexed
>
> How do I do it ?
>
>
>
>
>
> Thanks
>
>
>
>
>
>
>
> p.s.
>
> I wish I could index like this:
>
> curl
>
http://localhost:8080/solr/update/csv?stream.file=test.csv&fieldnames=fi
> eld1,field2&f.domain.value=98765
>
<http://localhost:8080/solr/update/csv?stream.file=test.csv&fieldnames=f
> ield1,field2&f.domain.value=98765>  (where 98765 is the domain id for
> ths specific csv file)
>
>


question: how to divide the indexing into sperate domains

2007-08-09 Thread Ben Shlomo, Yatir
Hi!

say I have 300 csv files that I need to index. 

Each one holds millions of lines (each line is a few fields separated by
commas)

Each csv file represents a different domain of data (e,g, file1 is
computers, file2 is flowers, etc)

There is no indication of the domain ID in the data inside the csv file

 

When I search I would like to specify the id of a specific domain

And I want solr to search only in this domain - to save time and reduce
the number of matches

I need to specify during indexing - the domain id of the csv file being
indexed

How do I do it ?

 

 

Thanks 

 

 

 

p.s. 

I wish I could index like this:

curl
http://localhost:8080/solr/update/csv?stream.file=test.csv&fieldnames=fi
eld1,field2&f.domain.value=98765
  (where 98765 is the domain id for
ths specific csv file)