[jira] Updated: (SOLR-19) pom.xml to support maven2

2008-01-26 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-19:
--

Attachment: pom.xml

updated for lucene 2.3

(still not a real solution)

> pom.xml to support maven2
> -
>
> Key: SOLR-19
> URL: https://issues.apache.org/jira/browse/SOLR-19
> Project: Solr
>  Issue Type: New Feature
> Environment: all
>Reporter: Darren Erik Vengroff
>Priority: Minor
> Attachments: pom.xml, pom.xml, pom.xml, solr-maven2.zip, 
> solr-test-maven.zip
>
>
> I created a preliminary pom.xml to support building solr with maven2.
> Currently it compiles all the core solr code into a jar and runs the junit 
> tests.
> Dropping this pom.xml into the root dir, in parallel with build.xml, will let 
> those who wish to build with maven2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-467) Remove 'core' options from solrj

2008-01-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562879#action_12562879
 ] 

Yonik Seeley commented on SOLR-467:
---

The other alternative to creating a new SolrServer for talking to different 
cores or servers is to treat it more like HttpClient... the URL could be 
specified in the request itself (and could default to the URL of the SolrServer 
if unset).  Not sure if it's worth the complexity though since SolrServer 
instances should be very lightweight now.  I'm considering simply creating one 
per request to be sent in distributed search... it gets rid of any caching 
complexity.

> Remove 'core' options from solrj
> 
>
> Key: SOLR-467
> URL: https://issues.apache.org/jira/browse/SOLR-467
> Project: Solr
>  Issue Type: Task
>Affects Versions: 1.3
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Fix For: 1.3
>
>
> Remove the option to change cores for a SolrServer.  The core should be 
> selected in the constructor -- for Http version, this is with the path, and 
> the Embedded version can get registered with a name or SolrCore.
> This will require creating a new SolrServer for multi-core admin operations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-319) changes SynonymFilterFactoryto "Analyze" synonyms file

2008-01-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562871#action_12562871
 ] 

[EMAIL PROTECTED] edited comment on SOLR-319 at 1/26/08 7:27 AM:


Perhaps it should be a full analyzer rather than just a tokenizer?
In the query elevation (query boosting) component, a field type can be 
specified, and the analyzer from that is used.

Oh, I just saw that Hoss already brought up that idea long ago...

  was (Author: [EMAIL PROTECTED]):
Perhaps it should be a full analyzer rather than just a tokenizer?
In the query elevation (query boosting) component, a field type can be 
specified, and the analyzer from that is used.

  
> changes SynonymFilterFactoryto "Analyze" synonyms file
> --
>
> Key: SOLR-319
> URL: https://issues.apache.org/jira/browse/SOLR-319
> Project: Solr
>  Issue Type: Improvement
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-319.patch, SOLR-319.patch, SOLR-319.patch
>
>
> WHAT:
> Currently, SynonymFilterFactory works very well with N-gram tokenizer 
> (CJKTokenizer, for example).
> But we have to take care of the statement in synonyms.txt.
> For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
> C1C2C3 maps to C4C5C6,
> I have to write the rule as follows:
> C1C2 C2C3 => C4C5 C5C6
> But I want to write it "C1C2C3=>C4C5C6". This patch allows it. It is also 
> helpful for sharing synonyms.txt.
> HOW:
> tokenFactory attribute is added to  class="solr.SynonymFilterFactory"/>.
> If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
> to create Tokenizer.
> Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
> synonyms.txt file.
> sample-1: CJKTokenizer
>  positionIncrementGap="100">
>   
> 
>  synonyms="ngram_synonym_test_ja.txt"
>   ignoreCase="true" expand="true" 
> tokenFactory="solr.CJKTokenizerFactory"/>
> 
>   
>   
> 
> 
>   
> 
> sample-2: NGramTokenizer
>  positionIncrementGap="100">
>   
>  maxGramSize="2"/>
> 
>   
>   
>  maxGramSize="2"/>
>  synonyms="ngram_synonym_test_ngram.txt"
>   ignoreCase="true" expand="true"
>   tokenFactory="solr.NGramTokenizerFactory" 
> minGramSize="2" maxGramSize="2"/>
> 
>   
> 
> backward compatibility:
> Yes. If you omit tokenFactory attribute from  class="solr.SynonymFilterFactory"/> tag, it works as usual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-319) changes SynonymFilterFactoryto "Analyze" synonyms file

2008-01-26 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562871#action_12562871
 ] 

Yonik Seeley commented on SOLR-319:
---

Perhaps it should be a full analyzer rather than just a tokenizer?
In the query elevation (query boosting) component, a field type can be 
specified, and the analyzer from that is used.


> changes SynonymFilterFactoryto "Analyze" synonyms file
> --
>
> Key: SOLR-319
> URL: https://issues.apache.org/jira/browse/SOLR-319
> Project: Solr
>  Issue Type: Improvement
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-319.patch, SOLR-319.patch, SOLR-319.patch
>
>
> WHAT:
> Currently, SynonymFilterFactory works very well with N-gram tokenizer 
> (CJKTokenizer, for example).
> But we have to take care of the statement in synonyms.txt.
> For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
> C1C2C3 maps to C4C5C6,
> I have to write the rule as follows:
> C1C2 C2C3 => C4C5 C5C6
> But I want to write it "C1C2C3=>C4C5C6". This patch allows it. It is also 
> helpful for sharing synonyms.txt.
> HOW:
> tokenFactory attribute is added to  class="solr.SynonymFilterFactory"/>.
> If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
> to create Tokenizer.
> Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
> synonyms.txt file.
> sample-1: CJKTokenizer
>  positionIncrementGap="100">
>   
> 
>  synonyms="ngram_synonym_test_ja.txt"
>   ignoreCase="true" expand="true" 
> tokenFactory="solr.CJKTokenizerFactory"/>
> 
>   
>   
> 
> 
>   
> 
> sample-2: NGramTokenizer
>  positionIncrementGap="100">
>   
>  maxGramSize="2"/>
> 
>   
>   
>  maxGramSize="2"/>
>  synonyms="ngram_synonym_test_ngram.txt"
>   ignoreCase="true" expand="true"
>   tokenFactory="solr.NGramTokenizerFactory" 
> minGramSize="2" maxGramSize="2"/>
> 
>   
> 
> backward compatibility:
> Yes. If you omit tokenFactory attribute from  class="solr.SynonymFilterFactory"/> tag, it works as usual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-319) changes SynonymFilterFactoryto "Analyze" synonyms file

2008-01-26 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-319:


Attachment: SOLR-319.patch

updated for current trunk (SOLR-466).

> changes SynonymFilterFactoryto "Analyze" synonyms file
> --
>
> Key: SOLR-319
> URL: https://issues.apache.org/jira/browse/SOLR-319
> Project: Solr
>  Issue Type: Improvement
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-319.patch, SOLR-319.patch, SOLR-319.patch
>
>
> WHAT:
> Currently, SynonymFilterFactory works very well with N-gram tokenizer 
> (CJKTokenizer, for example).
> But we have to take care of the statement in synonyms.txt.
> For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
> C1C2C3 maps to C4C5C6,
> I have to write the rule as follows:
> C1C2 C2C3 => C4C5 C5C6
> But I want to write it "C1C2C3=>C4C5C6". This patch allows it. It is also 
> helpful for sharing synonyms.txt.
> HOW:
> tokenFactory attribute is added to  class="solr.SynonymFilterFactory"/>.
> If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
> to create Tokenizer.
> Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
> synonyms.txt file.
> sample-1: CJKTokenizer
>  positionIncrementGap="100">
>   
> 
>  synonyms="ngram_synonym_test_ja.txt"
>   ignoreCase="true" expand="true" 
> tokenFactory="solr.CJKTokenizerFactory"/>
> 
>   
>   
> 
> 
>   
> 
> sample-2: NGramTokenizer
>  positionIncrementGap="100">
>   
>  maxGramSize="2"/>
> 
>   
>   
>  maxGramSize="2"/>
>  synonyms="ngram_synonym_test_ngram.txt"
>   ignoreCase="true" expand="true"
>   tokenFactory="solr.NGramTokenizerFactory" 
> minGramSize="2" maxGramSize="2"/>
> 
>   
> 
> backward compatibility:
> Yes. If you omit tokenFactory attribute from  class="solr.SynonymFilterFactory"/> tag, it works as usual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-01-26 Thread Thomas Peuss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562848#action_12562848
 ] 

Thomas Peuss commented on SOLR-127:
---

The cacheHeaderSeed is a good idea. It is like the version number on DNS 
zonefile entries. The downside of such a thing is that you have to change it 
manually (but Solr users are clever guys ;-) ). I would see no special meaning 
in the seed - just a string that we mix with the version number of the index. 
The user can choose whatever he wants there as long as he changes it when the 
config changes substantially. Something like _cacheHeaderSeed="20080126123300"_ 
should be as good as _cacheHeaderSeed="version23"_. As we are caching the ETag 
now we can use an MD5 or SHA1 hash for the Etag as well. We simply throw the 
cacheHeaderSeed and the index version number into the hashing function and 
Base64-encode the result of the hash. With that we obfuscate the index version 
as well for the paranoid ones and always have an ETag of the same size 
independent of the length of the seed. Additionally the Etag changes completely 
if only one bit has changed. This makes the _equals_ check for the Etag a bit 
faster as well.

The problems I see with cacheHeaderVersion beeing a timestamp is that you can 
really break your caching headers if you put a future time stamp in there. This 
is not allowed by the RFC. Of course we can check for a future time stamp and 
give a warning and use the current time instead.

When I remember right XML attributes don't need a value. So we can do the 
following:
{code}

...becomes...
Cache-Control: max-age="23", no-cache, must-revalidate, private="Foo", asdf, 
qwert="666"
{code}
But again a very good idea to be flexible here. But the named list syntax might 
be easier to handle in the code. A regex solution should work as well (but 
should fail gracefully with a warning logged to the logfile). max-age is the 
only value that is of interest for the code.

> Make Solr more friendly to external HTTP caches
> ---
>
> Key: SOLR-127
> URL: https://issues.apache.org/jira/browse/SOLR-127
> Project: Solr
>  Issue Type: Wish
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 1.3
>
> Attachments: CacheUnitTest.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch
>
>
> an offhand comment I saw recently reminded me of something that really bugged 
> me about the serach solution i used *before* Solr -- it didn't play nicely 
> with HTTP caches that might be sitting in front of it.
> at the moment, Solr doesn't put in particularly usefull info in the HTTP 
> Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
> requests with a 400, and doesn't do anything special with If-Modified-Since.
> t the very least, we can set a Last-Modified based on when the current 
> IndexReder was open (if not the Date on the IndexReader) and use the same 
> info to determing how to respond to If-Modified-Since requests.
> (for the record, i think the reason this hasn't occured to me in the 2+ years 
> i've been using Solr, is because with the internal caching, i've yet to need 
> to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.