date:20080208

[
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567067#action_12567067
]

Fuad Efendi commented on SOLR-127:
--

In my configuration I do not need SOLR caching at all; but I use HTTP caching
more effectively.

HTTPD memory- and disk- cache is used between Client and Middleware. No any
caching between Middleware and SOLR. Middleware responds to HTTPD with 304 if
necessary, with correct Last-Modified etc., and request do not reach SOLR. This
caching configuration works fine with AJAX too, without SOLR's caching headers.

I've seen unnecessary extra-work with this implementation... taking long
time... and tried to point on some meanings of response codes (for Web).

Make Solr more friendly to external HTTP caches
---

Key: SOLR-127
URL: https://issues.apache.org/jira/browse/SOLR-127
Project: Solr
Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
Fix For: 1.3

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches


[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567072#action_12567072
 ] 

Fuad Efendi commented on SOLR-127:
--

Regarding HTTP-Caching-Load-Balancer between SOLR and Middleware:
You need to deal with additional internal http-cache at middleware. In most 
cases Middleware generates content from different sources and can't reroute 
If-Modified-Since request to SOLR without internal caching. For instance, if 
you are using SOLRJ, you have to implement *additional* cache for 
SolrDocument... 


 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

[
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567111#action_12567111
]

Yonik Seeley commented on SOLR-342:
---

Yikes! Thanks for the report Will. It certainly sounds like a Lucene issue to
me (esp because removal of this patch fixes things... that means it only
happens under certain lucene settings). Could you perhaps try the very latest
Lucene trunk (there were some seemingly unrelated fixes recently).

Add support for Lucene's new Indexing and merge features (excluding
Document/Field/Token reuse)
---

Key: SOLR-342
URL: https://issues.apache.org/jira/browse/SOLR-342
Project: Solr
Issue Type: Improvement
Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch,
SOLR-342.patch, SOLR-342.tar.gz

LUCENE-843 adds support for new indexing capabilities using the
setRAMBufferSizeMB() method that should significantly speed up indexing for
many applications. To fix this, we will need trunk version of Lucene (or
wait for the next official release of Lucene)
Side effect of this is that Lucene's new, faster StandardTokenizer will also
be incorporated.
Also need to think about how we want to incorporate the new merge scheduling
functionality (new default in Lucene is to do merges in a background thread)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches


[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567077#action_12567077
 ] 

Fuad Efendi commented on SOLR-127:
--

Thomas, Walter,

Finally I agree, thanks!

Middleware should not send/reroute If-Modified-Since, and should not 
implement internal cache (in provided by me contr-sample): with caching 
enabled, it will simply retrieve cached content.

I do not agree with 400, it is place for DoS attacks. Query parsing error 
should be 200 with caching response codes. Of course, I know RFC 2616. 

 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-08 Thread Walter Underwood (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567068#action_12567068
]

Walter Underwood commented on SOLR-127:
---

Two reasons to do HTTP caching for Solr: First, Solr is HTTP and needs to
implement that correctly. Second, caches are much harder to implement and test
than the cache information in HTTP. HTTP caches already exist and are well
tested, so the implementation cost is zero and deployment is very easy.

The HTTP spec already covers which responses should be cached. A 400 response
may only be cached if it includes explicit cache control headers which allow
that. See RFC 2616.

We are using a caching load balancer and caching in Apache front ends to
Tomcat. We see an increase of more than 2X in the capacity of our search farm.

I would recommend against Solr-specific cache information in the XML part of
the responses. Distributed caching is extremely difficult to get right. Around
25% of the HTTP 1.1 spec is devoted to caching and there are still grey areas.

Make Solr more friendly to external HTTP caches
---

Key: SOLR-127
URL: https://issues.apache.org/jira/browse/SOLR-127
Project: Solr
Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
Fix For: 1.3

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches


[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567064#action_12567064
 ] 

Fuad Efendi commented on SOLR-127:
--

I agree.
Caching Load Balancer between SOLR and APP Servers is excellent idea, and it 
can be black box without any knowlege about SOLR API.
AJAX can use internal cache of web browser; FLEX probably too...
Question: do we need caching of static (non-changed) content from SOLR such as 
400: Query parsing error?.. 



 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches


[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567081#action_12567081
 ] 

Fuad Efendi commented on SOLR-127:
--

Fortunately, we are not using 404 trying to retrieve removed document... In 
initial design (I believe) SOLR developers simply wrapped all exceptions into 
400, and empty result set is not an exception.

 Make Solr more friendly to external HTTP caches
 ---

 Key: SOLR-127
 URL: https://issues.apache.org/jira/browse/SOLR-127
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
 HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch


 an offhand comment I saw recently reminded me of something that really bugged 
 me about the serach solution i used *before* Solr -- it didn't play nicely 
 with HTTP caches that might be sitting in front of it.
 at the moment, Solr doesn't put in particularly usefull info in the HTTP 
 Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
 requests with a 400, and doesn't do anything special with If-Modified-Since.
 t the very least, we can set a Last-Modified based on when the current 
 IndexReder was open (if not the Date on the IndexReader) and use the same 
 info to determing how to respond to If-Modified-Since requests.
 (for the record, i think the reason this hasn't occured to me in the 2+ years 
 i've been using Solr, is because with the internal caching, i've yet to need 
 to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

[
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567099#action_12567099
]

Will Johnson commented on SOLR-342:
---

I think we're running into a very serious issue with trunk + this patch.
either the document summaries are not matched or the overall matching is
'wrong'. i did find this in the lucene jira: LUCENE-994

Note that these changes will break users of ParallelReader because the
parallel indices will no longer have matching docIDs. Such users need
to switch IndexWriter back to flushing by doc count, and switch the
MergePolicy back to LogDocMergePolicy. It's likely also necessary to
switch the MergeScheduler back to SerialMergeScheduler to ensure
deterministic docID assignment.

we're seeing rather consistent bad results but only after 20-30k documents and
multiple commits and wondering if anyone else is seeing anything. i've
verified that the results are bad even though luke which would seem to remove
the search side of hte solr equation. the basic test case is to search for
title:foo and get back documents that only have title:bar. we're going to
start on a unit test but give the document counts and the corpus we're testing
against it may be a while so i thought i'd ask to see if anyone had any hints.

removing this patch seems to remove the issue so i doesn't appear to be a
lucene problem

Add support for Lucene's new Indexing and merge features (excluding
Document/Field/Token reuse)
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)


[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567152#action_12567152
 ] 

Yonik Seeley commented on SOLR-342:
---

Thanks Will.  My guess at this point is a merging bug in Lucene, so you might 
be able to reproduce by forcing more merges.  Make mergeFacor=2 and lower how 
many docs it takes to do a merge (set maxBufferedDocs to 2, or set 
ramBufferSizeMB to 1).

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)


[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567198#action_12567198
 ] 

Will Johnson commented on SOLR-342:
---

we have: 

mergeFactor10/mergeFactor 
ramBufferSizeMB64/ramBufferSizeMB 
maxMergeDocs2147483647/maxMergeDocs 

and i'm working on a unit test but just adding a few terms per doc doesnt seem 
to trigger it, at least not 'quickly.'


 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing

2008-02-08 Thread Oleg Gnatovskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567184#action_12567184
 ] 

Oleg Gnatovskiy commented on SOLR-236:
--

Also, is field collapse going to be a part of the upcoming Solr 1.3 release, or 
will we need to run a patch on it?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-475) multi-valued faceting via un-inverted field

[
https://issues.apache.org/jira/browse/SOLR-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yonik Seeley updated SOLR-475:
--

Attachment: UnInvertedField.java

Prototype attached.
This is completely untested code, and is still missing the solr interface +
caching.
The approach is described in the comments (cut-n-pasted here).
Any thoughts or comments on the approach?

I may not have time to immediately work on this (fix the bugs, add tests, hook
up to solr, add caching of un-inverted field, etc), so additional contributions
in this direction are welcome!

{code}
/**
* Final form of the un-inverted field:
* Each document points to a list of term numbers that are contained in that
document.
*
* Term numbers are in sorted order, and are encoded as variable-length
deltas from the
* previous term number. Real term numbers start at 2 since 0 and 1 are
reserved. A
* term number of 0 signals the end of the termNumber list.
*
* There is a singe int[maxDoc()] which either contains a pointer into a
byte[] for
* the termNumber lists, or directly contains the termNumber list if it fits
in the 4
* bytes of an integer. If the first byte in the integer is 1, the next 3
bytes
* are a pointer into a byte[] where the termNumber list starts.
*
* There are actually 256 byte arrays, to compensate for the fact that the
pointers
* into the byte arrays are only 3 bytes long. The correct byte array for a
document
* is a function of it's id.
*
* To save space and speed up faceting, any term that matches enough
documents will
* not be un-inverted... it will be skipped while building the un-inverted
field structore,
* and will use a set intersection method during faceting.
*
* To further save memory, the terms (the actual string values) are not all
stored in
* memory, but a TermIndex is used to convert term numbers to term values only
* for the terms needed after faceting has completed. Only every 128th term
value
* is stored, along with it's corresponding term number, and this is used as
an
* index to find the closest term and iterate until the desired number is hit
(very
* much like Lucene's own internal term index).
*/
{code}

multi-valued faceting via un-inverted field
---

Key: SOLR-475
URL: https://issues.apache.org/jira/browse/SOLR-475
Project: Solr
Issue Type: New Feature
Reporter: Yonik Seeley
Attachments: UnInvertedField.java

Facet multi-valued fields via a counting method (like the FieldCache method)
on an un-inverted representation of the field. For each doc, look at it's
terms and increment a count for that term.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-08 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567207#action_12567207
 ] 

Grant Ingersoll commented on SOLR-342:
--

You mentioned ParallelReader, are you using that, or any other patches?
{quote}
problem to happen before we get 20-30k large docs
{quote}

what is large in your terms?  

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)


[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567140#action_12567140
 ] 

Yonik Seeley commented on SOLR-342:
---

Will, are you using term vectors anywhere, or any customizations to Solr (at 
the lucene level)?
When you say document summaries are not matched, you you mean that the 
incorrect documents are matched, or that the correct documents are matched but 
just highlighting is wrong?


 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: /example/solr/bin is empty in trunk

2008-02-08 Thread Fuad Efendi

 Try ant example in the base dir to build the example.

Thanks, it works

[jira] Created: (SOLR-475) multi-valued faceting via un-inverted field

multi-valued faceting via un-inverted field
---

 Key: SOLR-475
 URL: https://issues.apache.org/jira/browse/SOLR-475
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley


Facet multi-valued fields via a counting method (like the FieldCache method) on 
an un-inverted representation of the field.  For each doc, look at it's terms 
and increment a count for that term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)


[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567235#action_12567235
 ] 

Will Johnson commented on SOLR-342:
---

we're using SolrCore in terms of:

core = new SolrCore(foo, dataDir, solrConfig, solrSchema);
UpdateHandler handler = core.getUpdateHandler();
updateHandler.addDoc(command);

which is a bit more low level than normal however when we flipped back to solr 
trunk + lucene 2.3 everything was fine so it leads me to belive that we are ok 
in that respect.

i was going to try and reproduce with lucene directly also but that too is a 
bit outside the scope of what i have time for at the moment.  

and we're not getting any exceptions, just bad search results.

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)


[ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567218#action_12567218
 ] 

Will Johnson commented on SOLR-342:
---

we're not using parallel reader but we are using direct core access instead of 
going over http.  as for doc size, we're indexing wikipedia but creating 
anumber of extra fields.  they are only large in comparison to any of the 
'large volume' tests i've seen in most of the solr and lucene tests.  

- will

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing

2008-02-08 Thread Oleg Gnatovskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567224#action_12567224
 ] 

Oleg Gnatovskiy commented on SOLR-236:
--

OK, I think I have the first issue figured out. If the current resultset (lets 
say the first 10 rows) doesn't have the field that we are collapsing on, the 
counts don't show up. Is that correct?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Attachments: field-collapsing-extended-592129.patch, 
 field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: concurrency while indexing

2008-02-08 Thread Yonik Seeley

On Feb 8, 2008 3:53 AM, Thorsten Scherler
[EMAIL PROTECTED] wrote:
 I have following usecase, one solr instance which receives add/commit
 calls constantly from 3 different clients.

 The machine:
 Model: HP Proliant DL 360
 Memory: 2 Gb
 CPU: 1 Intel Xeon 3.02 Ghz
 Disk: 2 x 36 GB SCSI en RAID

 I need to raise the number of clients to about 10, can this be a problem
 for the indexing machine?

I'd stop the clients from doing commit themselves unless it's really
necessary, and use some form of time based autocommit (see example
solrconfig.xml).

-Yonik

Re: /example/solr/bin is empty in trunk

2008-02-08 Thread Yonik Seeley

On Feb 8, 2008 1:13 AM, Fuad Efendi [EMAIL PROTECTED] wrote:

 Is it correct?.. I want to try distribution/replication in v.2.3


Try ant example in the base dir to build the example.

-Yonik

[jira] Updated: (SOLR-475) multi-valued faceting via un-inverted field