SolrJ and DataImportHandler nightly javadocs on website

2008-07-30 Thread Shalin Shekhar Mangar
Hi,

Currently the nightly build copies only the main javadocs (api folder) to
the Solr site. It should do that for SolrJ and DataImportHandler too. I
believe this is being done through hudson but I don't have a hudson account.

Can someone with appropriate privileges do that? I can do that if I can get
access to hudson. Once it is done, those two javadocs need to be linked off
the Solr website too.

-- 
Regards,
Shalin Shekhar Mangar.


[jira] Created: (SOLR-670) UpdateHandler must provide a rollback feature

2008-07-30 Thread Noble Paul (JIRA)
UpdateHandler must provide a rollback feature
-

 Key: SOLR-670
 URL: https://issues.apache.org/jira/browse/SOLR-670
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Noble Paul


Lucene IndexWriter already has a rollback method. There should be a counterpart 
for the same in _UpdateHandler_  so that users can do a rollback over http 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-669) SOLR currently does not support caching for (Query, FacetFieldList)

2008-07-30 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618578#action_12618578
 ] 

Fuad Efendi commented on SOLR-669:
--

To confirm: 
- SOLR uses Lucene internals (with caching) only if field is non-tokenized 
single-valued non-boolean, and SOLR does not have own cache to store calculated 
intersections (_faceting_).


> SOLR currently does not support caching for (Query, FacetFieldList)
> ---
>
> Key: SOLR-669
> URL: https://issues.apache.org/jira/browse/SOLR-669
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
>Reporter: Fuad Efendi
>   Original Estimate: 1680h
>  Remaining Estimate: 1680h
>
> It is huge performance bottleneck and it describes huge difference between 
> qtime and SolrJ's elapsedTime. I quickly browsed SolrIndexSearcher: it caches 
> only (Key, DocSet/DocList ) key-value pairs and it does not have 
> cache for (Query, FacetFieldList).
> filterCache stores DocList for each 'filter' and is used for constant 
> recalculations...
> This would be significant performance improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-669) SOLR currently does not support caching for (Query, FacetFieldList)

2008-07-30 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618574#action_12618574
 ] 

Fuad Efendi commented on SOLR-669:
--

This piece of code in SimpleFacets:
{code}
if (sf.multiValued() || ft.isTokenized() || ft instanceof BoolField) {
  // Always use filters for booleans... we know the number of values is 
very small.
  counts = getFacetTermEnumCounts(searcher, docs, field, offset, limit, 
mincount,missing,sort,prefix);
} else {
  // TODO: future logic could use filters instead of the fieldcache if
  // the number of terms in the field is small enough.
  counts = getFieldCacheCounts(searcher, docs, field, offset,limit, 
mincount, missing, sort, prefix);
}
{code}

- optimization for single-valued non-tokenized... 'Lucene FieldCache to get 
counts for each unique field value in docs'


We should implement *additional* caching to support this _the FilterCache to 
get the intersection_; FilterCache stores DocSet only and does not store 
NamedList of field-intersections:

{code}
/**
   * Returns a list of terms in the specified field along with the 
   * corresponding count of documents in the set that match that constraint.
   * This method uses the FilterCache to get the intersection count between 
docs
   * and the DocSet for each term in the filter.
   *
   * @see FacetParams#FACET_LIMIT
   * @see FacetParams#FACET_ZEROS
   * @see FacetParams#FACET_MISSING
   */
  public NamedList getFacetTermEnumCounts(SolrIndexSearcher searcher, DocSet 
docs, String field, int offset, int limit, int mincount, boolean missing, 
boolean sort, String prefix)
throws IOException {
...
}
{code}


> SOLR currently does not support caching for (Query, FacetFieldList)
> ---
>
> Key: SOLR-669
> URL: https://issues.apache.org/jira/browse/SOLR-669
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
>Reporter: Fuad Efendi
>   Original Estimate: 1680h
>  Remaining Estimate: 1680h
>
> It is huge performance bottleneck and it describes huge difference between 
> qtime and SolrJ's elapsedTime. I quickly browsed SolrIndexSearcher: it caches 
> only (Key, DocSet/DocList ) key-value pairs and it does not have 
> cache for (Query, FacetFieldList).
> filterCache stores DocList for each 'filter' and is used for constant 
> recalculations...
> This would be significant performance improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Too many open files with DirectUpdateHandlerOptimizeTest

2008-07-30 Thread Grant Ingersoll
For the record, the test was meant to test the new partial optimize  
capabilities by counting the number of segments available after doing  
partial optimizes.


-Grant
On Jul 29, 2008, at 5:58 PM, Tricia Williams wrote:


Works for me!
I don't really understand the purpose of the test, but it looks like  
it is meant as a weak stress test.  If the test is significantly  
scaled back, is the test still accomplishing what it is mean to do?   
Should the onus instead be on the developer to meet a minimum  
requirement?  Would this same problem crop up in normal (or even  
abnormal) usage of Solr deployed on a similar system?


Tricia

Yonik Seeley wrote:

OK, I've now scaled back the test by a factor of 10 (50 segments
instead of 500).
-Yonik

On Tue, Jul 29, 2008 at 4:53 PM, Tricia Williams
<[EMAIL PROTECTED]> wrote:

As of revision 680834 the DirectUpdateHandlerOptimizeTest is still  
failing.

I haven't made any changes to the file handle limit on my machine.

Tricia

Yonik Seeley wrote:

I just committed a fix that will make the test use the compound  
file

format.  Hopefully that will be sufficient.

-Yonik













[jira] Resolved: (SOLR-469) Data Import RequestHandler

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-469.


Resolution: Fixed

Committed revision 681182.

A big thanks to Noble for designing these cool features and to Grant for his 
reviews, feedback and support! Thanks to everybody who helped us by using the 
patch, giving suggestions and pointing out bugs!

Solr rocks!

> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617654#action_12617654
 ] 

funtick edited comment on SOLR-665 at 7/30/08 10:08 AM:


bq. The Solr admin pages will not give you exact measurements. 
Yes, and I do not need exact measurements! It gives me averageTimePerRequest 
which improved almost 10 times on production server. Should I right JUnit tests 
and execute it in a single-threaded environment? Better is to use The Grinder, 
but I don't have time and spare CPUs.

bq. I've seen throughputs in excess of 400 searches per second. 
But 'searches per second' is not the same as 'average response time'!!!

bq. Are you using highlighting or anything else that might be CPU-intensive at 
all? 
Yes, I am using highlighting. You can see it at http://www.tokenizer.org


bq. I'm guessing that you're caching the results of all queries in memory such 
that no disk access is necessary.
{color:red} But this is another bug of SOLR!!! I am using extremely large 
caches but SOLR still *recalculates* facet intersections. {color} - SOLR-669

bq. A FIFO cache might become a bottleneck itself - if the cache is very large 
and the most frequently accessed item is inserted just after the cache is 
created, all accesses will need to traverse all the other entries before 
getting that item.

- sorry, I didn't understand... yes, if cache contains 10 entries and 'most 
popular item' is removed... Why 'traverse all the other entries before getting 
that item'? why 9 items are less popular (cumulative) than single one 
(absolute)?

You probably mean 'LinkedList traversal' but this is not the case. This is why 
we need to browse JavaSource... LinkedHashMap extends HashMap and there is no 
any 'traversal',
{code}
public V get(Object key) {
Entry e = (Entry)getEntry(key);
if (e == null)
return null;
e.recordAccess(this);
return e.value;
}
{code}


bq. Consider the following case: thread A performs a synchronized put, thread B 
performs an unsynchronized get on the same key. B gets scheduled before A 
completes, the returned value will be undefined.
the returned value is well defined: it is either null or correct value.

bq. That's exactly the case here - the update thread modifies the map 
structurally! 
Who cares? We are not iterating the map!

bq. That said, if you can show conclusively (e.g. with a profiler) that the 
synchronized access is indeed the bottleneck and incurs a heavy penalty on 
performance, then I'm all for investigating this further.

*What?!!*


Anyway, I believe simplified ConcurrentLRU backed by ConcurrentHashMap is 
easier to understand and troubleshoot...

bq. I don't see the point of the static popularityCounter... that looks like a 
bug.
No, it is not a bug. it is virtually "checkpoint", like as a timer, one timer 
for all instances. We can use System.currentTimeMillis() instead, but static 
volatile long is faster.

About specific use case: yes... if someone has 0.5 seconds response time for 
faceted queries I am very happy... I had 15 seconds before going with FIFO. 


  was (Author: funtick):
bq. The Solr admin pages will not give you exact measurements. 
Yes, and I do not need exact measurements! It gives me averageTimePerRequest 
which improved almost 10 times on production server. Should I right JUnit tests 
and execute it in a single-threaded environment? Better is to use The Grinder, 
but I don't have time and spare CPUs.

bq. I've seen throughputs in excess of 400 searches per second. 
But 'searches per second' is not the same as 'average response time'!!!

bq. Are you using highlighting or anything else that might be CPU-intensive at 
all? 
Yes, I am using highlighting. You can see it at http://www.tokenizer.org


bq. I'm guessing that you're caching the results of all queries in memory such 
that no disk access is necessary.
{color:red} But this is another bug of SOLR!!! I am using extremely large 
caches but SOLR still *recalculates* facet intersections. {color} - SOLR-669
http://issues.apache.org/jira/browse/SOLR-669

bq. A FIFO cache might become a bottleneck itself - if the cache is very large 
and the most frequently accessed item is inserted just after the cache is 
created, all accesses will need to traverse all the other entries before 
getting that item.

- sorry, I didn't understand... yes, if cache contains 10 entries and 'most 
popular item' is removed... Why 'traverse all the other entries before getting 
that item'? why 9 items are less popular (cumulative) than single one 
(absolute)?

You probably mean 'LinkedList traversal' but this is not the case. This is why 
we need to browse JavaSource... LinkedHashMap extends HashMap and there is no 
any 'traversal',
{code}
public V get(Object key) {
Entry e = (Entry)getEntry(key);
if (e == null

[jira] Issue Comment Edited: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617654#action_12617654
 ] 

funtick edited comment on SOLR-665 at 7/30/08 10:08 AM:


bq. The Solr admin pages will not give you exact measurements. 
Yes, and I do not need exact measurements! It gives me averageTimePerRequest 
which improved almost 10 times on production server. Should I right JUnit tests 
and execute it in a single-threaded environment? Better is to use The Grinder, 
but I don't have time and spare CPUs.

bq. I've seen throughputs in excess of 400 searches per second. 
But 'searches per second' is not the same as 'average response time'!!!

bq. Are you using highlighting or anything else that might be CPU-intensive at 
all? 
Yes, I am using highlighting. You can see it at http://www.tokenizer.org


bq. I'm guessing that you're caching the results of all queries in memory such 
that no disk access is necessary.
{color:red} But this is another bug of SOLR!!! I am using extremely large 
caches but SOLR still *recalculates* facet intersections. {color} - SOLR-669
http://issues.apache.org/jira/browse/SOLR-669

bq. A FIFO cache might become a bottleneck itself - if the cache is very large 
and the most frequently accessed item is inserted just after the cache is 
created, all accesses will need to traverse all the other entries before 
getting that item.

- sorry, I didn't understand... yes, if cache contains 10 entries and 'most 
popular item' is removed... Why 'traverse all the other entries before getting 
that item'? why 9 items are less popular (cumulative) than single one 
(absolute)?

You probably mean 'LinkedList traversal' but this is not the case. This is why 
we need to browse JavaSource... LinkedHashMap extends HashMap and there is no 
any 'traversal',
{code}
public V get(Object key) {
Entry e = (Entry)getEntry(key);
if (e == null)
return null;
e.recordAccess(this);
return e.value;
}
{code}


bq. Consider the following case: thread A performs a synchronized put, thread B 
performs an unsynchronized get on the same key. B gets scheduled before A 
completes, the returned value will be undefined.
the returned value is well defined: it is either null or correct value.

bq. That's exactly the case here - the update thread modifies the map 
structurally! 
Who cares? We are not iterating the map!

bq. That said, if you can show conclusively (e.g. with a profiler) that the 
synchronized access is indeed the bottleneck and incurs a heavy penalty on 
performance, then I'm all for investigating this further.

*What?!!*


Anyway, I believe simplified ConcurrentLRU backed by ConcurrentHashMap is 
easier to understand and troubleshoot...

bq. I don't see the point of the static popularityCounter... that looks like a 
bug.
No, it is not a bug. it is virtually "checkpoint", like as a timer, one timer 
for all instances. We can use System.currentTimeMillis() instead, but static 
volatile long is faster.

About specific use case: yes... if someone has 0.5 seconds response time for 
faceted queries I am very happy... I had 15 seconds before going with FIFO. 


  was (Author: funtick):
bq. The Solr admin pages will not give you exact measurements. 
Yes, and I do not need exact measurements! It gives me averageTimePerRequest 
which improved almost 10 times on production server. Should I right JUnit tests 
and execute it in a single-threaded environment? Better is to use The Grinder, 
but I don't have time and spare CPUs.

bq. I've seen throughputs in excess of 400 searches per second. 
But 'searches per second' is not the same as 'average response time'!!!

bq. Are you using highlighting or anything else that might be CPU-intensive at 
all? 
Yes, I am using highlighting. You can see it at http://www.tokenizer.org


bq. I'm guessing that you're caching the results of all queries in memory such 
that no disk access is necessary.
{color:red} But this is another bug of SOLR!!! I am using extremely large 
caches but SOLR still *recalculates* facet intersections. {color}
https://issues.apache.org/jira/browse/SOLR-669

bq. A FIFO cache might become a bottleneck itself - if the cache is very large 
and the most frequently accessed item is inserted just after the cache is 
created, all accesses will need to traverse all the other entries before 
getting that item.

- sorry, I didn't understand... yes, if cache contains 10 entries and 'most 
popular item' is removed... Why 'traverse all the other entries before getting 
that item'? why 9 items are less popular (cumulative) than single one 
(absolute)?

You probably mean 'LinkedList traversal' but this is not the case. This is why 
we need to browse JavaSource... LinkedHashMap extends HashMap and there is no 
any 'traversal',
{code}
public V get(Object key) {
Entry e = (Entry)

[jira] Issue Comment Edited: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12617654#action_12617654
 ] 

funtick edited comment on SOLR-665 at 7/30/08 10:07 AM:


bq. The Solr admin pages will not give you exact measurements. 
Yes, and I do not need exact measurements! It gives me averageTimePerRequest 
which improved almost 10 times on production server. Should I right JUnit tests 
and execute it in a single-threaded environment? Better is to use The Grinder, 
but I don't have time and spare CPUs.

bq. I've seen throughputs in excess of 400 searches per second. 
But 'searches per second' is not the same as 'average response time'!!!

bq. Are you using highlighting or anything else that might be CPU-intensive at 
all? 
Yes, I am using highlighting. You can see it at http://www.tokenizer.org


bq. I'm guessing that you're caching the results of all queries in memory such 
that no disk access is necessary.
{color:red} But this is another bug of SOLR!!! I am using extremely large 
caches but SOLR still *recalculates* facet intersections. {color}
https://issues.apache.org/jira/browse/SOLR-669

bq. A FIFO cache might become a bottleneck itself - if the cache is very large 
and the most frequently accessed item is inserted just after the cache is 
created, all accesses will need to traverse all the other entries before 
getting that item.

- sorry, I didn't understand... yes, if cache contains 10 entries and 'most 
popular item' is removed... Why 'traverse all the other entries before getting 
that item'? why 9 items are less popular (cumulative) than single one 
(absolute)?

You probably mean 'LinkedList traversal' but this is not the case. This is why 
we need to browse JavaSource... LinkedHashMap extends HashMap and there is no 
any 'traversal',
{code}
public V get(Object key) {
Entry e = (Entry)getEntry(key);
if (e == null)
return null;
e.recordAccess(this);
return e.value;
}
{code}


bq. Consider the following case: thread A performs a synchronized put, thread B 
performs an unsynchronized get on the same key. B gets scheduled before A 
completes, the returned value will be undefined.
the returned value is well defined: it is either null or correct value.

bq. That's exactly the case here - the update thread modifies the map 
structurally! 
Who cares? We are not iterating the map!

bq. That said, if you can show conclusively (e.g. with a profiler) that the 
synchronized access is indeed the bottleneck and incurs a heavy penalty on 
performance, then I'm all for investigating this further.

*What?!!*


Anyway, I believe simplified ConcurrentLRU backed by ConcurrentHashMap is 
easier to understand and troubleshoot...

bq. I don't see the point of the static popularityCounter... that looks like a 
bug.
No, it is not a bug. it is virtually "checkpoint", like as a timer, one timer 
for all instances. We can use System.currentTimeMillis() instead, but static 
volatile long is faster.

About specific use case: yes... if someone has 0.5 seconds response time for 
faceted queries I am very happy... I had 15 seconds before going with FIFO. 


  was (Author: funtick):
bq. The Solr admin pages will not give you exact measurements. 
Yes, and I do not need exact measurements! It gives me averageTimePerRequest 
which improved almost 10 times on production server. Should I right JUnit tests 
and execute it in a single-threaded environment? Better is to use The Grinder, 
but I don't have time and spare CPUs.

bq. I've seen throughputs in excess of 400 searches per second. 
But 'searches per second' is not the same as 'average response time'!!!

bq. Are you using highlighting or anything else that might be CPU-intensive at 
all? 
Yes, I am using highlighting. You can see it at http://www.tokenizer.org


bq. I'm guessing that you're caching the results of all queries in memory such 
that no disk access is necessary.
{color:red} But this is another bug of SOLR!!! I am using extremely large 
caches but SOLR still *recalculates* facet intersections. {color}

bq. A FIFO cache might become a bottleneck itself - if the cache is very large 
and the most frequently accessed item is inserted just after the cache is 
created, all accesses will need to traverse all the other entries before 
getting that item.

- sorry, I didn't understand... yes, if cache contains 10 entries and 'most 
popular item' is removed... Why 'traverse all the other entries before getting 
that item'? why 9 items are less popular (cumulative) than single one 
(absolute)?

You probably mean 'LinkedList traversal' but this is not the case. This is why 
we need to browse JavaSource... LinkedHashMap extends HashMap and there is no 
any 'traversal',
{code}
public V get(Object key) {
Entry e = (Entry)getEntry(key);
if (e == null)
return 

[jira] Updated: (SOLR-669) SOLR currently does not support caching for (Query, FacetFieldList)

2008-07-30 Thread Fuad Efendi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fuad Efendi updated SOLR-669:
-

Remaining Estimate: 1680h  (was: 0.03h)
 Original Estimate: 1680h  (was: 0.03h)

> SOLR currently does not support caching for (Query, FacetFieldList)
> ---
>
> Key: SOLR-669
> URL: https://issues.apache.org/jira/browse/SOLR-669
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
>Reporter: Fuad Efendi
>   Original Estimate: 1680h
>  Remaining Estimate: 1680h
>
> It is huge performance bottleneck and it describes huge difference between 
> qtime and SolrJ's elapsedTime. I quickly browsed SolrIndexSearcher: it caches 
> only (Key, DocSet/DocList ) key-value pairs and it does not have 
> cache for (Query, FacetFieldList).
> filterCache stores DocList for each 'filter' and is used for constant 
> recalculations...
> This would be significant performance improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-669) SOLR currently does not support caching for (Query, FacetFieldList)

2008-07-30 Thread Fuad Efendi (JIRA)
SOLR currently does not support caching for (Query, FacetFieldList)
---

 Key: SOLR-669
 URL: https://issues.apache.org/jira/browse/SOLR-669
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Fuad Efendi


It is huge performance bottleneck and it describes huge difference between 
qtime and SolrJ's elapsedTime. I quickly browsed SolrIndexSearcher: it caches 
only (Key, DocSet/DocList ) key-value pairs and it does not have 
cache for (Query, FacetFieldList).
filterCache stores DocList for each 'filter' and is used for constant 
recalculations...

This would be significant performance improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-438) Allow multiple stopword files

2008-07-30 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic resolved SOLR-438.
---

Resolution: Fixed

> Allow multiple stopword files
> -
>
> Key: SOLR-438
> URL: https://issues.apache.org/jira/browse/SOLR-438
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Otis Gospodnetic
>Assignee: Otis Gospodnetic
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-438.patch, SOLR-438.patch
>
>
> It wouldn't hurt Solr (StopFilterFactory) to allow one to specify multiple 
> stopword files.
> I've patched Solr to support this, for example:
>  words="hr_stopwords.txt, hr_stopmorphemes.txt"/>
> I'll upload a patch shortly and commit later this week.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-469) Data Import RequestHandler

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618395#action_12618395
 ] 

Shalin Shekhar Mangar commented on SOLR-469:


Thanks Grant :)

I shall go over the javadocs once more and then commit it.

> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-469) Data Import RequestHandler

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-469:
--

Assignee: Shalin Shekhar Mangar

> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-469) Data Import RequestHandler

2008-07-30 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-469:


Assignee: (was: Grant Ingersoll)

Shalin, 

I don't have the time at the moment on this so feel free to use your new 
powers.  I think putting it into contrib and marking it as experimental is good 
(such that it is bound as strictly by back compat rules).

I have some needs that I would like to see worked in, but I haven't had the 
time and I don't think they should hold back others, as it is obviously in 
significant use already.  They are also nothing earth-shattering

So, it's all yours.  Enjoy.

> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fuad Efendi updated SOLR-665:
-

Attachment: SimplestConcurrentLRUCache.java

> FIFO Cache (Unsynchronized): 9x times performance boost
> ---
>
> Key: SOLR-665
> URL: https://issues.apache.org/jira/browse/SOLR-665
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: JRockit R27 (Java 6)
>Reporter: Fuad Efendi
> Attachments: ConcurrentFIFOCache.java, ConcurrentFIFOCache.java, 
> ConcurrentLRUCache.java, ConcurrentLRUWeakCache.java, FIFOCache.java, 
> SimplestConcurrentLRUCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:  filterCache  
> class:org.apache.solr.search.LRUCache  
> version:  1.0  
> description:  LRU Cache(maxSize=1000, initialSize=1000)  
> stats:lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fuad Efendi updated SOLR-665:
-

Attachment: (was: SimplestLRUCache.java)

> FIFO Cache (Unsynchronized): 9x times performance boost
> ---
>
> Key: SOLR-665
> URL: https://issues.apache.org/jira/browse/SOLR-665
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: JRockit R27 (Java 6)
>Reporter: Fuad Efendi
> Attachments: ConcurrentFIFOCache.java, ConcurrentFIFOCache.java, 
> ConcurrentLRUCache.java, ConcurrentLRUWeakCache.java, FIFOCache.java, 
> SimplestConcurrentLRUCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:  filterCache  
> class:org.apache.solr.search.LRUCache  
> version:  1.0  
> description:  LRU Cache(maxSize=1000, initialSize=1000)  
> stats:lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-469) Data Import RequestHandler

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-469:
---

Attachment: SOLR-469-contrib.patch

Sorry for the spam due to my (multiple) mistakes. I think this one is the one :)

A new patch containing the following changes:

# On further thinking about Interface vs. Abstract classes, we have decided to 
replace all interfaces with abstract classes. Transformer, Context, 
EntityProcessor, Evaluator, DataSource and VariableResolver are now abstract 
classes.
# The bug reported by Jonathan has been fixed and the TestCachedEntityProcessor 
has been updated to catch it. This exception used to be thrown only if the 
first request to CachedEntityProcessor needs a row which is not in cache. 
Subsequent requests were not affected.
# Javadoc improvements. In particular, all the API related classes are marked 
as experimental and subject to change.
# Propset Id in all classes.

Users who have written their own custom transformers using the API will need to 
change their code. Sorry for the inconvenience.

Grant - Is there anything else we need to do to get it committed?

> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Grant Ingersoll
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-469) Data Import RequestHandler

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-469:
---

Attachment: (was: SOLR-469.patch)

> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Grant Ingersoll
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fuad Efendi updated SOLR-665:
-

Attachment: (was: ConcurrentLRUWeakCache.java)

> FIFO Cache (Unsynchronized): 9x times performance boost
> ---
>
> Key: SOLR-665
> URL: https://issues.apache.org/jira/browse/SOLR-665
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: JRockit R27 (Java 6)
>Reporter: Fuad Efendi
> Attachments: ConcurrentFIFOCache.java, ConcurrentFIFOCache.java, 
> ConcurrentLRUCache.java, ConcurrentLRUWeakCache.java, FIFOCache.java, 
> SimplestLRUCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:  filterCache  
> class:org.apache.solr.search.LRUCache  
> version:  1.0  
> description:  LRU Cache(maxSize=1000, initialSize=1000)  
> stats:lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fuad Efendi updated SOLR-665:
-

Attachment: (was: ConcurrentLRUWeakCache.java)

> FIFO Cache (Unsynchronized): 9x times performance boost
> ---
>
> Key: SOLR-665
> URL: https://issues.apache.org/jira/browse/SOLR-665
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: JRockit R27 (Java 6)
>Reporter: Fuad Efendi
> Attachments: ConcurrentFIFOCache.java, ConcurrentFIFOCache.java, 
> ConcurrentLRUCache.java, ConcurrentLRUWeakCache.java, FIFOCache.java, 
> SimplestLRUCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:  filterCache  
> class:org.apache.solr.search.LRUCache  
> version:  1.0  
> description:  LRU Cache(maxSize=1000, initialSize=1000)  
> stats:lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-469) Data Import RequestHandler

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-469:
---

Comment: was deleted

> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Grant Ingersoll
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-469) Data Import RequestHandler

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-469:
---

Attachment: SOLR-469.patch

A new patch containing the following changes:

# On further thinking about Interface vs. Abstract classes, we have decided to 
replace all interfaces with abstract classes. Transformer, Context, 
EntityProcessor, Evaluator, DataSource and VariableResolver are now abstract 
classes.
# The bug reported by Jonathan has been fixed and the TestCachedEntityProcessor 
has been updated to catch it. This exception used to be thrown only if the 
first request to CachedEntityProcessor needs a row which is not in cache. 
Subsequent requests were not affected.
# Javadoc improvements. In particular, all the API related classes are marked 
as experimental and subject to change.
# Propset Id in all classes.

Users who have written their own custom transformers using the API will need to 
change their code. Sorry for the inconvenience.

Grant - Is there anything else we need to do to get it committed?

> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Grant Ingersoll
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-469) Data Import RequestHandler

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618370#action_12618370
 ] 

Shalin Shekhar Mangar commented on SOLR-469:


A new patch containing the following changes:

# On further thinking about Interface vs. Abstract classes, we have decided to 
replace all interfaces with abstract classes. Transformer, Context, 
EntityProcessor, Evaluator, DataSource and VariableResolver are now abstract 
classes.
# The bug reported by Jonathan has been fixed and the TestCachedEntityProcessor 
has been updated to catch it. This exception used to be thrown only if the 
first request to CachedEntityProcessor needs a row which is not in cache. 
Subsequent requests were not affected.
# Javadoc improvements. In particular, all the API related classes are marked 
as experimental and subject to change.
# Propset Id in all classes.

Users who have written their own custom transformers using the API will need to 
change their code. Sorry for the inconvenience.

Grant -- Is there anything else we need to do to get it committed?

> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Grant Ingersoll
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-469) Data Import RequestHandler

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-469:
---

Comment: was deleted

> Data Import RequestHandler
> --
>
> Key: SOLR-469
> URL: https://issues.apache.org/jira/browse/SOLR-469
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Grant Ingersoll
> Fix For: 1.3
>
> Attachments: SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469-contrib.patch, SOLR-469-contrib.patch, SOLR-469-contrib.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
> SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
> * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>   - It also takes in a properties file for the data source 
> configuraution
> * Given the configuration it can also generate the solr schema.xml
> * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>   -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>   - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
> * It provides a admin page
>   - where we can schedule it to be run automatically at regular 
> intervals
>   - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fuad Efendi updated SOLR-665:
-

Attachment: SimplestLRUCache.java

> FIFO Cache (Unsynchronized): 9x times performance boost
> ---
>
> Key: SOLR-665
> URL: https://issues.apache.org/jira/browse/SOLR-665
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: JRockit R27 (Java 6)
>Reporter: Fuad Efendi
> Attachments: ConcurrentFIFOCache.java, ConcurrentFIFOCache.java, 
> ConcurrentLRUCache.java, ConcurrentLRUWeakCache.java, 
> ConcurrentLRUWeakCache.java, ConcurrentLRUWeakCache.java, FIFOCache.java, 
> SimplestLRUCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:  filterCache  
> class:org.apache.solr.search.LRUCache  
> version:  1.0  
> description:  LRU Cache(maxSize=1000, initialSize=1000)  
> stats:lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fuad Efendi updated SOLR-665:
-

Attachment: (was: SimplestLRUCache.java)

> FIFO Cache (Unsynchronized): 9x times performance boost
> ---
>
> Key: SOLR-665
> URL: https://issues.apache.org/jira/browse/SOLR-665
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: JRockit R27 (Java 6)
>Reporter: Fuad Efendi
> Attachments: ConcurrentFIFOCache.java, ConcurrentFIFOCache.java, 
> ConcurrentLRUCache.java, ConcurrentLRUWeakCache.java, 
> ConcurrentLRUWeakCache.java, ConcurrentLRUWeakCache.java, FIFOCache.java, 
> SimplestLRUCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:  filterCache  
> class:org.apache.solr.search.LRUCache  
> version:  1.0  
> description:  LRU Cache(maxSize=1000, initialSize=1000)  
> stats:lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fuad Efendi updated SOLR-665:
-

Attachment: SimplestLRUCache.java

> FIFO Cache (Unsynchronized): 9x times performance boost
> ---
>
> Key: SOLR-665
> URL: https://issues.apache.org/jira/browse/SOLR-665
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: JRockit R27 (Java 6)
>Reporter: Fuad Efendi
> Attachments: ConcurrentFIFOCache.java, ConcurrentFIFOCache.java, 
> ConcurrentLRUCache.java, ConcurrentLRUWeakCache.java, 
> ConcurrentLRUWeakCache.java, ConcurrentLRUWeakCache.java, FIFOCache.java, 
> SimplestLRUCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:  filterCache  
> class:org.apache.solr.search.LRUCache  
> version:  1.0  
> description:  LRU Cache(maxSize=1000, initialSize=1000)  
> stats:lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618339#action_12618339
 ] 

funtick edited comment on SOLR-665 at 7/30/08 7:08 AM:
---

Noble, thanks for feedback!

Of course my code is buggy but I only wanted _to illustrate_ simplest idea; I 
am extremely busy with other staff (Liferay) and can't focus on SOLR 
improvements... may be during weekend.

bq. ...will always evaluate to false. And the reference will always have one 
value
- yes, this is bug. There are other bugs too...

bq. We must be removing the entry which was accessed first (not last).. 
I mean (and code too) the same; probably wrong wording

bq. And the static volatile counter is not threadsafe. 
Do we _really-really_ need thread safety here? By using 'volatile' I only 
prevent _some_ JVMs from trying to optimize some code (and cause problems).

bq. There is no need to use a WeakReference anywhere
Agree... 

bq. To get that you must maintian a linkedlist the way linkedhashmap maintains. 
No other shortcut. 
May be... but looks similar to Arrays.sort(), or TreeSet, and etc I am 
trying to avoid this. 'No other shortcut' - may be, but I am unsure.

Thanks!



  was (Author: funtick):
Noble, thanks for feedback!

Of course my code is buggy but I only wanted _to illustrate_ simplest idea; I 
am extremely busy with other staff (Liferay) and can't focus on SOLR 
improvements... may be during weekend.

bq. ...will always evaluate to false. And the reference will always have one 
value
- yes, this is bug. There are other bugs too...

bq. We must be removing the entry which was accessed first (not last).. 
I mean (and code too) the same; probably wrong wording

bq. And the static volatile counter is not threadsafe. 
Do we _really-really_ need thread safety here? By using 'volatile' I only 
prevent _some_ JVMs from trying to optimize some code (and cause problems with 
per-instance variables which never change).

bq. There is no need to use a WeakReference anywhere
Agree... 

bq. To get that you must maintian a linkedlist the way linkedhashmap maintains. 
No other shortcut. 
May be... but looks similar to Arrays.sort(), or TreeSet, and etc I am 
trying to avoid this. 'No other shortcut' - may be, but I am unsure.

Thanks!


  
> FIFO Cache (Unsynchronized): 9x times performance boost
> ---
>
> Key: SOLR-665
> URL: https://issues.apache.org/jira/browse/SOLR-665
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: JRockit R27 (Java 6)
>Reporter: Fuad Efendi
> Attachments: ConcurrentFIFOCache.java, ConcurrentFIFOCache.java, 
> ConcurrentLRUCache.java, ConcurrentLRUWeakCache.java, 
> ConcurrentLRUWeakCache.java, ConcurrentLRUWeakCache.java, FIFOCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:  filterCache  
> class:org.apache.solr.search.LRUCache  
> version:  1.0  
> description:  LRU Cache(maxSize=1000, initialSize=1000)  
> stats:lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618339#action_12618339
 ] 

funtick edited comment on SOLR-665 at 7/30/08 7:06 AM:
---

Noble, thanks for feedback!

Of course my code is buggy but I only wanted _to illustrate_ simplest idea; I 
am extremely busy with other staff (Liferay) and can't focus on SOLR 
improvements... may be during weekend.

bq. ...will always evaluate to false. And the reference will always have one 
value
- yes, this is bug. There are other bugs too...

bq. We must be removing the entry which was accessed first (not last).. 
I mean (and code too) the same; probably wrong wording

bq. And the static volatile counter is not threadsafe. 
Do we _really-really_ need thread safety here? By using 'volatile' I only 
prevent _some_ JVMs from trying to optimize some code (and cause problems with 
per-instance variables which never change).

bq. There is no need to use a WeakReference anywhere
Agree... 

bq. To get that you must maintian a linkedlist the way linkedhashmap maintains. 
No other shortcut. 
May be... but looks similar to Arrays.sort(), or TreeSet, and etc I am 
trying to avoid this. 'No other shortcut' - may be, but I am unsure.

Thanks!



  was (Author: funtick):
Nobble, thanks for feedback!

Of course my code is buggy but I only wanted _to illustrate_ simplest idea; I 
am extremely busy with other staff (Liferay) and can't focus on SOLR 
improvements... may be during weekend.

bq. ...will always evaluate to false. And the reference will always have one 
value
- yes, this is bug. There are other bugs too...

bq. We must be removing the entry which was accessed first (not last).. 
I mean (and code too) the same; probably wrong wording

bq. And the static volatile counter is not threadsafe. 
Do we _really-really_ need thread safety here? By using 'volatile' I only 
prevent _some_ JVMs from trying to optimize some code (and cause problems with 
per-instance variables which never change).

bq. There is no need to use a WeakReference anywhere
Agree... 

bq. To get that you must maintian a linkedlist the way linkedhashmap maintains. 
No other shortcut. 
May be... but looks similar to Arrays.sort(), or TreeSet, and etc I am 
trying to avoid this. 'No other shortcut' - may be, but I am unsure.

Thanks!


  
> FIFO Cache (Unsynchronized): 9x times performance boost
> ---
>
> Key: SOLR-665
> URL: https://issues.apache.org/jira/browse/SOLR-665
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: JRockit R27 (Java 6)
>Reporter: Fuad Efendi
> Attachments: ConcurrentFIFOCache.java, ConcurrentFIFOCache.java, 
> ConcurrentLRUCache.java, ConcurrentLRUWeakCache.java, 
> ConcurrentLRUWeakCache.java, ConcurrentLRUWeakCache.java, FIFOCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:  filterCache  
> class:org.apache.solr.search.LRUCache  
> version:  1.0  
> description:  LRU Cache(maxSize=1000, initialSize=1000)  
> stats:lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost

2008-07-30 Thread Fuad Efendi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618339#action_12618339
 ] 

Fuad Efendi commented on SOLR-665:
--

Nobble, thanks for feedback!

Of course my code is buggy but I only wanted _to illustrate_ simplest idea; I 
am extremely busy with other staff (Liferay) and can't focus on SOLR 
improvements... may be during weekend.

bq. ...will always evaluate to false. And the reference will always have one 
value
- yes, this is bug. There are other bugs too...

bq. We must be removing the entry which was accessed first (not last).. 
I mean (and code too) the same; probably wrong wording

bq. And the static volatile counter is not threadsafe. 
Do we _really-really_ need thread safety here? By using 'volatile' I only 
prevent _some_ JVMs from trying to optimize some code (and cause problems with 
per-instance variables which never change).

bq. There is no need to use a WeakReference anywhere
Agree... 

bq. To get that you must maintian a linkedlist the way linkedhashmap maintains. 
No other shortcut. 
May be... but looks similar to Arrays.sort(), or TreeSet, and etc I am 
trying to avoid this. 'No other shortcut' - may be, but I am unsure.

Thanks!



> FIFO Cache (Unsynchronized): 9x times performance boost
> ---
>
> Key: SOLR-665
> URL: https://issues.apache.org/jira/browse/SOLR-665
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: JRockit R27 (Java 6)
>Reporter: Fuad Efendi
> Attachments: ConcurrentFIFOCache.java, ConcurrentFIFOCache.java, 
> ConcurrentLRUCache.java, ConcurrentLRUWeakCache.java, 
> ConcurrentLRUWeakCache.java, ConcurrentLRUWeakCache.java, FIFOCache.java
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Attached is modified version of LRUCache where 
> 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that 
> "reordering"/true (performance bottleneck of LRU) is replaced to 
> "insertion-order"/false (so that it became FIFO)
> 2. Almost all (absolutely unneccessary) synchronized statements commented out
> See discussion at 
> http://www.nabble.com/LRUCache---synchronized%21--td16439831.html
> Performance metrics (taken from SOLR Admin):
> LRU
> Requests: 7638
> Average Time-Per-Request: 15300
> Average Request-per-Second: 0.06
> FIFO:
> Requests: 3355
> Average Time-Per-Request: 1610
> Average Request-per-Second: 0.11
> Performance increased 9 times which roughly corresponds to a number of CPU in 
> a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org)
> Current number of documents: 7494689
> name:  filterCache  
> class:org.apache.solr.search.LRUCache  
> version:  1.0  
> description:  LRU Cache(maxSize=1000, initialSize=1000)  
> stats:lookups : 15966954582
> hits : 16391851546
> hitratio : 0.102
> inserts : 4246120
> evictions : 0
> size : 2668705
> cumulative_lookups : 16415839763
> cumulative_hits : 16411608101
> cumulative_hitratio : 0.99
> cumulative_inserts : 4246246
> cumulative_evictions : 0 
> Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr 1.3 release date

2008-07-30 Thread Yonik Seeley
On Wed, Jul 30, 2008 at 5:01 AM, Shalin Shekhar Mangar
<[EMAIL PROTECTED]> wrote:
> We can't release with Lucene trunk jars, can we?

We can if we feel comfortable with it.

-Yonik


[jira] Created: (SOLR-668) Snapcleaner removes newest snapshots in Solaris

2008-07-30 Thread Gabriel Hernandez (JIRA)
Snapcleaner removes newest snapshots in Solaris
---

 Key: SOLR-668
 URL: https://issues.apache.org/jira/browse/SOLR-668
 Project: Solr
  Issue Type: Bug
  Components: replication
Affects Versions: 1.2
 Environment: Solaris 10
Reporter: Gabriel Hernandez
Priority: Minor
 Fix For: 1.2


When running the snapcleaner script from cron with the -N option, the script is 
removing the newest snapshots instead of the oldest snapshots.  I tweaked and 
validated this can be corrected by making the following change in the 
snapcleaner script:

elif [[ -n ${num} ]]
then
logMessage cleaning up all snapshots except for the most recent 
${num} ones
unset snapshots count

- snapshots=`ls -cd ${data_dir}/snapshot.* 2>/dev/null`

+ snapshots=`ls -crd ${data_dir}/snapshot.* 2>/dev/null` 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr 1.3 release date

2008-07-30 Thread Shalin Shekhar Mangar
There are more. SpellCheckComponent and support for partial optimizes also
depend on features in Lucene trunk.

On Wed, Jul 30, 2008 at 2:40 PM, Lars Kotthoff <[EMAIL PROTECTED]> wrote:

> > We can't release with Lucene trunk jars, can we?
>
> Ah, that reminds me of something. The feature added in SOLR-610 sort-of
> depends
> on LUCENE-1321 to work properly. This will only bite people who use
> SOLR-610 and
> try to highlight something at the very end of the field, so I don't think
> it's
> much of an issue.
>
> Nevertheless it's a reason to upgrade to the next release of Lucene as soon
> as
> it's available.
>
> Lars
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 1.3 release date

2008-07-30 Thread Lars Kotthoff
> We can't release with Lucene trunk jars, can we?

Ah, that reminds me of something. The feature added in SOLR-610 sort-of depends
on LUCENE-1321 to work properly. This will only bite people who use SOLR-610 and
try to highlight something at the very end of the field, so I don't think it's
much of an issue.

Nevertheless it's a reason to upgrade to the next release of Lucene as soon as
it's available.

Lars


[jira] Commented: (SOLR-632) Upgrade bundled Jetty with latest from vendor

2008-07-30 Thread Lars Kotthoff (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618260#action_12618260
 ] 

Lars Kotthoff commented on SOLR-632:


The only workaround is to disable the test :)

Switching to 6.1.11 means that for anybody using Solr with something that does 
HTTP caching error messages might be cached. In practice it's probably not a 
big deal, as most caching configurations don't cache non-200 responses anyway.

This could have a negative impact if the cache is instructed to store 
everything that doesn't explicitely allow caching and the time content is 
cached is very long or an error occurs only for a short time and more or less 
fixes itself.

> Upgrade bundled Jetty with latest from vendor
> -
>
> Key: SOLR-632
> URL: https://issues.apache.org/jira/browse/SOLR-632
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
>Reporter: Norberto Meijome
>Assignee: Erik Hatcher
>Priority: Minor
> Fix For: 1.3
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> The Jetty that is bundled for the example application is version 6.1.3, which 
> is over a year old.
> We should upgrade Jetty to the latest, 6.1.11.
> I am not sure how to attach a patch to remove files, so these are the steps :
> Using as base the root of 'apache-solr-nightly':
> DELETE:
> example/lib/jetty-6.1.3.jar
> example/lib/jetty-util-6.1.3.jar
> example/lib/servlet-api-2.5-6.1.3.jar
> ADD
> example/lib/jetty-6.1.11.jar
> example/lib/jetty-util-6.1.11.jar
> example/lib/servlet-api-2.5-6.1.11.jar
> ---
> The files to be added can be found in Jetty's binary distribution file :
> http://dist.codehaus.org/jetty/jetty-6.1.11/jetty-6.1.11.zip
> I couldn't find any noticeable changes in jetty.xml that should be carried 
> over.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr 1.3 release date

2008-07-30 Thread Shalin Shekhar Mangar
We can't release with Lucene trunk jars, can we?

On Mon, Jul 28, 2008 at 6:24 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Thu, Jul 10, 2008 at 11:56 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> > So how about we make a 1.3 branch in about 2 weeks (the 25th), and
> > release soon after.
>
> Given the flurry of recent activity, seems like extending this a
> little longer (another week?) would be a good idea.
>
> -Yonik
>



-- 
Regards,
Shalin Shekhar Mangar.


[jira] Commented: (SOLR-632) Upgrade bundled Jetty with latest from vendor

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618257#action_12618257
 ] 

Shalin Shekhar Mangar commented on SOLR-632:


Are we still targeting this to be in Solr 1.3 ? Is there an easy workaround for 
the test?

Seems like 6.1.12 will take time (the RC1 was released today)
http://www.nabble.com/Jetty-6.1.12-rc1-Available!-ts18721171.html

> Upgrade bundled Jetty with latest from vendor
> -
>
> Key: SOLR-632
> URL: https://issues.apache.org/jira/browse/SOLR-632
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
>Reporter: Norberto Meijome
>Assignee: Erik Hatcher
>Priority: Minor
> Fix For: 1.3
>
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> The Jetty that is bundled for the example application is version 6.1.3, which 
> is over a year old.
> We should upgrade Jetty to the latest, 6.1.11.
> I am not sure how to attach a patch to remove files, so these are the steps :
> Using as base the root of 'apache-solr-nightly':
> DELETE:
> example/lib/jetty-6.1.3.jar
> example/lib/jetty-util-6.1.3.jar
> example/lib/servlet-api-2.5-6.1.3.jar
> ADD
> example/lib/jetty-6.1.11.jar
> example/lib/jetty-util-6.1.11.jar
> example/lib/servlet-api-2.5-6.1.11.jar
> ---
> The files to be added can be found in Jetty's binary distribution file :
> http://dist.codehaus.org/jetty/jetty-6.1.11/jetty-6.1.11.zip
> I couldn't find any noticeable changes in jetty.xml that should be carried 
> over.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-438) Allow multiple stopword files

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618247#action_12618247
 ] 

Shalin Shekhar Mangar commented on SOLR-438:


SOLR-663 has been committed. We can mark this issue as resolved.

> Allow multiple stopword files
> -
>
> Key: SOLR-438
> URL: https://issues.apache.org/jira/browse/SOLR-438
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Otis Gospodnetic
>Assignee: Otis Gospodnetic
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-438.patch, SOLR-438.patch
>
>
> It wouldn't hurt Solr (StopFilterFactory) to allow one to specify multiple 
> stopword files.
> I've patched Solr to support this, for example:
>  words="hr_stopwords.txt, hr_stopmorphemes.txt"/>
> I'll upload a patch shortly and commit later this week.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-663) Allow multiple files for stopwords, protwords and synonyms

2008-07-30 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-663.


Resolution: Fixed

Committed revision 680935.

> Allow multiple files for stopwords, protwords and synonyms
> --
>
> Key: SOLR-663
> URL: https://issues.apache.org/jira/browse/SOLR-663
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 1.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 1.3
>
> Attachments: SOLR-663.patch
>
>
> Allow multiple files separated by comma (escaped by backslash) for 
> StopFilterFactory, EnglishPorterFilterFactory, KeepWordFilterFactory and 
> SynonymFilterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.