Re: Multiple facet.prefix?

2009-08-20 Thread Avlesh Singh
Well, I have problems using the "filter" terminology. People who have
traditionally known the "facet.prefix" parameter would like to stick to
that. Having both of these might lead to confusion.
I have no problems with using local param syntax though. This should be
fine:
facet.field={!prefix=foo prefix=bar}myfield

Filtering within facets is a good idea but lets limit the scope of this
enhancement to multiple "facet.prefix" values for any facet.field.

Cheers
Avlesh

On Fri, Aug 21, 2009 at 11:54 AM, Yonik Seeley
wrote:

> On Fri, Aug 21, 2009 at 2:18 AM, Avlesh Singh wrote:
> > So, this is it -
> > facet.field=myField&f.foo.facet.prefix=foo&f.foo.facet.prefix=bar
> > Right?
>
> I'd actually prefer a more non-ambiguous and general syntax.
> It seems like you want to include a filter in faceting that you don't
> want applied to the normal search results (else you would have just
> used fq=myfield:(A* OR B*))
>
> So what about something like:
>
> facet.field={!filter="myfield:A* OR B*"}myfield
> or with dereferencing,
> facet.field={!filter=$f}myfield&f=myfield:A* OR B*
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Multiple facet.prefix?

2009-08-20 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 2:18 AM, Avlesh Singh wrote:
> So, this is it -
> facet.field=myField&f.foo.facet.prefix=foo&f.foo.facet.prefix=bar
> Right?

I'd actually prefer a more non-ambiguous and general syntax.
It seems like you want to include a filter in faceting that you don't
want applied to the normal search results (else you would have just
used fq=myfield:(A* OR B*))

So what about something like:

facet.field={!filter="myfield:A* OR B*"}myfield
or with dereferencing,
facet.field={!filter=$f}myfield&f=myfield:A* OR B*

-Yonik
http://www.lucidimagination.com


Re: Multiple facet.prefix?

2009-08-20 Thread Avlesh Singh
>
> So if you asked for the top 10, you would be fine with receiving all A* or
> all B* of those happened to be the top?  Or do you really want the top 10 A*
> and the top 10 B*?

For all use cases so far, yes.

No, that's just the thing - multiple prefixes specified this way would be
> the same logical facet.. hence no way to specify different limits, sorts, or
> whatever... those are per facet.
>
Awesome, we are all on the same page then.

So, this is it -
facet.field=myField&f.foo.facet.prefix=foo&f.foo.facet.prefix=bar
Right?

Cheers
Avlesh

On Fri, Aug 21, 2009 at 11:46 AM, Yonik Seeley
wrote:

> On Fri, Aug 21, 2009 at 2:10 AM, Avlesh Singh wrote:
> > Different ways to look at it Yonik. For things that I have needed this
> for,
> > so far (field:A* OR field:B*) was just as fine.
>
> So if you asked for the top 10, you would be fine with receiving all
> A* or all B* of those happened to be the top?  Or do you really want
> the top 10 A* and the top 10 B*?
>
> > Anyways, for me to file the ticket, what usage are we looking for? As
> > underneath?
> > facet.field=myField&f.foo.facet.prefix=foo&f.foo.facet.prefix=bar
> >
> > How would sort, limit and other parameters be passed? As underneath?
> > f.foo.facet.prefix.foo.sort=false&f.foo.facet.prefix.foo.limit=100
>
> No, that's just the thing - multiple prefixes specified this way would
> be the same logical facet.. hence no way to specify different limits,
> sorts, or whatever... those are per facet.
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Multiple facet.prefix?

2009-08-20 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 2:10 AM, Avlesh Singh wrote:
> Different ways to look at it Yonik. For things that I have needed this for,
> so far (field:A* OR field:B*) was just as fine.

So if you asked for the top 10, you would be fine with receiving all
A* or all B* of those happened to be the top?  Or do you really want
the top 10 A* and the top 10 B*?

> Anyways, for me to file the ticket, what usage are we looking for? As
> underneath?
> facet.field=myField&f.foo.facet.prefix=foo&f.foo.facet.prefix=bar
>
> How would sort, limit and other parameters be passed? As underneath?
> f.foo.facet.prefix.foo.sort=false&f.foo.facet.prefix.foo.limit=100

No, that's just the thing - multiple prefixes specified this way would
be the same logical facet.. hence no way to specify different limits,
sorts, or whatever... those are per facet.

-Yonik
http://www.lucidimagination.com


Re: Multiple facet.prefix?

2009-08-20 Thread Avlesh Singh
Different ways to look at it Yonik. For things that I have needed this for,
so far (field:A* OR field:B*) was just as fine.

Anyways, for me to file the ticket, what usage are we looking for? As
underneath?
facet.field=myField&f.foo.facet.prefix=foo&f.foo.facet.prefix=bar

How would sort, limit and other parameters be passed? As underneath?
f.foo.facet.prefix.foo.sort=false&f.foo.facet.prefix.foo.limit=100

Cheers
Avlesh

On Fri, Aug 21, 2009 at 11:20 AM, Yonik Seeley
wrote:

> Of course if you want the top 10 starting with "A" and the top 10
> starting with "B" then they are logically different facets.  It's not
> equivalent to including a filter of (field:A* OR field:B*).
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Multiple facet.prefix?

2009-08-20 Thread Yonik Seeley
Of course if you want the top 10 starting with "A" and the top 10
starting with "B" then they are logically different facets.  It's not
equivalent to including a filter of (field:A* OR field:B*).

-Yonik
http://www.lucidimagination.com


[jira] Updated: (SOLR-1356) Add support for Lucene's persian analysis

2009-08-20 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1356:
--

Attachment: SOLR-1356.patch

factory for the filter, and schema.xml examples (maybe unnecessary, feel free 
to ignore)

> Add support for Lucene's persian analysis
> -
>
> Key: SOLR-1356
> URL: https://issues.apache.org/jira/browse/SOLR-1356
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Robert Muir
>Priority: Minor
> Attachments: SOLR-1356.patch
>
>
> In this case, only a factory for the PersianNormalizationFilter (LUCENE-1628) 
> is needed.
> But the stopwords are very important, many are not really words such as the 
> plural ها
> So, an example showing how to load these from the jar file (similar to 
> SOLR-1336) should do the trick.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Multiple facet.prefix?

2009-08-20 Thread Avlesh Singh
Hmmm ... it went a long way than I thought :)

i think Avlesh is just suggesting that...
> facet.field=foo & f.foo.facet.prefix=a & f.foo.facet.prefix=z
>
You  are absolutely right, Hoss.

Until Yonik mentioned the "sort" behavior of facets, it never came into my
mind. My SQL analogy was .. "select ... where facet_field like "foo_%" or
facet_field like "bar_%" order by facetCount". Ordering between these was
something that I did not naturally desire. Its a nice to have thing for
sure.

Should I create a Jira issue for this enhancement?

Cheers
Avlesh

On Fri, Aug 21, 2009 at 11:04 AM, Chris Hostetter
wrote:

>
> : Ah, yeah... in this case the client would be able to determine which
> : was which by the prefix of the label.  But it brings up interesting
> : points about ordering... should all of prefix "A" come before prefix
> : "B" or should they be sorted by frequency.  If it's the latter case,
>
> i don't see why the facet.sort param wouldn't dictate that, just like i
> would expect facet.limit to keep working.
>
> : then it seems like it's more the case of wanting an additional filter
> : applied to the facet, which we should also support at some point.
>
> it's just a simple matter of programming ... and hard matter of finding
> the time/energy.
>
>
>
> -Hoss
>
>


Re: Multiple facet.prefix?

2009-08-20 Thread Chris Hostetter

: Ah, yeah... in this case the client would be able to determine which
: was which by the prefix of the label.  But it brings up interesting
: points about ordering... should all of prefix "A" come before prefix
: "B" or should they be sorted by frequency.  If it's the latter case,

i don't see why the facet.sort param wouldn't dictate that, just like i 
would expect facet.limit to keep working.

: then it seems like it's more the case of wanting an additional filter
: applied to the facet, which we should also support at some point.

it's just a simple matter of programming ... and hard matter of finding 
the time/energy.



-Hoss



Re: Updating Lucene/Solr home pages with news of the Solr book

2009-08-20 Thread Chris Hostetter

: Thanks Hoss for clarification and the specific URLs you referenced.  It's
: awkward to have the site managed in this manner instead of using some sort
: of online CMS/publishing webapp.

one man's trash, ... every CMS i've ever used had me wishing i could just 
edit text files and check them in. (and this way, it's easy to include 
copies of all docs in releases)

: I've updated a couple xml files and added an image.  Erik, can you simply
: accept them from me in an email to you or

...create a jira issue, just like any other patch.


-Hoss



Re: Updating Lucene/Solr home pages with news of the Solr book

2009-08-20 Thread David Smiley @MITRE.org

Yes the book was a ton of work; more than anticipated.  Never again, though
maybe an updated edition some day maybe.

Thanks Hoss for clarification and the specific URLs you referenced.  It's
awkward to have the site managed in this manner instead of using some sort
of online CMS/publishing webapp.

I've updated a couple xml files and added an image.  Erik, can you simply
accept them from me in an email to you or

~ David



hossman wrote:
> 
> 
> : Best way to make this happen is to submit patches to the desired
> projects
> : websites (and of course self add the book to the wiki spots too).
> 
> By which Erik means patches to the website *sources* (not patches to the 
> generated HTML).  Both Solr and the main lucene site use forrest to 
> generate the sites...
> 
> Solr website...
> http://wiki.apache.org/solr/Website_Update_HOWTO
> http://svn.apache.org/repos/asf/lucene/solr/trunk/src/site/
> 
> Main Lucene website...
> http://svn.apache.org/repos/asf/lucene/site/
> 
> You might want to take a look at how LIA got added to the LUcene-Java left 
> nav and submit a path that does hte same thing for Solr's site...
> http://svn.apache.org/viewvc?view=rev&revision=751729
> 
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Updating-Lucene-Solr-home-pages-with-news-of-the-Solr-book-tp25032908p25074276.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Commented: (SOLR-1375) BloomFilter on a field

2009-08-20 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745800#action_12745800
 ] 

Jason Rutherglen commented on SOLR-1375:


Lance,

Thanks for the info. BloomFilters can be ORed together. Hadoop
uses BFs for map side joins, which is similar to this use case.

Centralizing will help when performing millions of id membership
tests, though I'm going to benchmark first and see if the
current patch is good enough.

-J

> BloomFilter on a field
> --
>
> Key: SOLR-1375
> URL: https://issues.apache.org/jira/browse/SOLR-1375
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1375.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> * A bloom filter is a read only probabilistic set. Its useful
> for verifying a key exists in a set, though it returns false
> positives. http://en.wikipedia.org/wiki/Bloom_filter 
> * The use case is indexing in Hadoop and checking for duplicates
> against a Solr cluster (which when using term dictionary or a
> query) is too slow and exceeds the time consumed for indexing.
> When a match is found, the host, segment, and term are returned.
> If the same term is found on multiple servers, multiple results
> are returned by the distributed process. (We'll need to add in
> the core name I just realized). 
> * When new segments are created, and commit is called, a new
> bloom filter is generated from a given field (default:id) by
> iterating over the term dictionary values. There's a bloom
> filter file per segment, which is managed on each Solr shard.
> When segments are merged away, their corresponding .blm files is
> also removed. In a future version we'll have a central server
> for the bloom filters so we're not abusing the thread pool of
> the Solr proxy and the networking of the Solr cluster (this will
> be done sooner than later after testing this version). I held
> off because the central server requires syncing the Solr
> servers' files (which is like reverse replication). 
> * The patch uses the BloomFilter from Hadoop 0.20. I want to jar
> up only the necessary classes so we don't have a giant Hadoop
> jar in lib.
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html
> * Distributed code is added and seems to work, I extended
> TestDistributedSearch to test over multiple HTTP servers. I
> chose this approach rather than the manual method used by (for
> example) TermVectorComponent.testDistributed because I'm new to
> Solr's distributed search and wanted to learn how it works (the
> stages are confusing). Using this method, I didn't need to setup
> multiple tomcat servers and manually execute tests.
> * We need more of the bloom filter options passable via
> solrconfig
> * I'll add more test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1335) load core properties from a properties file

2009-08-20 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745799#action_12745799
 ] 

Noble Paul commented on SOLR-1335:
--

bq.would be nice to be able to conditionally enable/disable, say, /update 
handler by some deploy-time switch

it is not currently possible. I recommend adding an attribute to each of the 
plugins "enable as follows

{code:xml}

{code}

specifying the properties file in solrconfig is not a good option because the 
properties has to be loaded before loading solrconfig.xml so that the variables 
are replaced at load time of solrconfig



> load core properties from a properties file
> ---
>
> Key: SOLR-1335
> URL: https://issues.apache.org/jira/browse/SOLR-1335
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1335.patch, SOLR-1335.patch, SOLR-1335.patch
>
>
> There are  few ways of loading properties in runtime,
> # using env property using in the command line
> # if you use a multicore drop it in the solr.xml
> if not , the only way is to  keep separate solrconfig.xml for each instance.  
> #1 is error prone if the user fails to start with the correct system 
> property. 
> In our case we have four different configurations for the same deployment  . 
> And we have to disable replication of solrconfig.xml. 
> It would be nice if I can distribute four properties file so that our ops can 
> drop  the right one and start Solr. Or it is possible for the operations to 
> edit a properties file  but it is risky to edit solrconfig.xml if he does not 
> understand solr
> I propose a properties file in the instancedir as solrcore.properties . If 
> present would be loaded and added as core specific properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Multiple facet.prefix?

2009-08-20 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 1:02 AM, Chris
Hostetter wrote:
>
> : > Or, may be just multiple facet.prefix values, Yonik; exactly the same as
> : > facet.field works.
> :
> : The problem then becomes labels.
> : facet.field=foo goes under the label "foo"
> :
> : We already have a mechanism to re-label results, seems like we should use 
> that?
>
> i think Avlesh is just suggesting that...
>
>    facet.field=foo & f.foo.facet.prefix=a & f.foo.facet.prefix=z
>
> ...should result in a single facet.field result section for "foo" with the
> values that start with "a" and the values that start with "b"
>
> it seems like in an ideal world, both syntaxes would work -- the one above
> to make a single section, and the one below to have two seperate
> sections...

Ah, yeah... in this case the client would be able to determine which
was which by the prefix of the label.  But it brings up interesting
points about ordering... should all of prefix "A" come before prefix
"B" or should they be sorted by frequency.  If it's the latter case,
then it seems like it's more the case of wanting an additional filter
applied to the facet, which we should also support at some point.

-Yonik
http://www.lucidimagination.com


Re: Multiple facet.prefix?

2009-08-20 Thread Chris Hostetter

: > Or, may be just multiple facet.prefix values, Yonik; exactly the same as
: > facet.field works.
: 
: The problem then becomes labels.
: facet.field=foo goes under the label "foo"
: 
: We already have a mechanism to re-label results, seems like we should use 
that?

i think Avlesh is just suggesting that...

facet.field=foo & f.foo.facet.prefix=a & f.foo.facet.prefix=z

...should result in a single facet.field result section for "foo" with the 
values that start with "a" and the values that start with "b"

it seems like in an ideal world, both syntaxes would work -- the one above 
to make a single section, and the one below to have two seperate 
sections...

: >> I think that once again, local params are the answer... the same as
: >> they were with faceting on a single field but excluding different
: >> filters.
: >>
: >> facet.field={!prefix=foo key=label1}myfield
: >> facet.field={!prefix=bar key=label2}myfield



-Hoss



[jira] Updated: (SOLR-1362) WordDelimiterFilter position increment bug

2009-08-20 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1362:
--

Attachment: SOLR-1362_tests.txt

I started working on a patch, but found the existing behavior to be more 
strange than I originally thought.

there are some bugs in the existing behavior as well, completely separate but 
along the same lines of this issue.

So here are some tests, let me know what you think...

> WordDelimiterFilter position increment bug
> --
>
> Key: SOLR-1362
> URL: https://issues.apache.org/jira/browse/SOLR-1362
> Project: Solr
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
>Priority: Minor
> Attachments: SOLR-1362.patch, SOLR-1362_tests.txt
>
>
> WordDelimiterFilter sometimes assigns high position increment values, which 
> inhibits phrase matches.
> If this is a feature and not a bug please change the issue type, and I will 
> change the patch to propose this as an option...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-914) Presence of finalize() in the codebase

2009-08-20 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745790#action_12745790
 ] 

Noble Paul commented on SOLR-914:
-

I guess this should be enough.

> Presence of finalize() in the codebase 
> ---
>
> Key: SOLR-914
> URL: https://issues.apache.org/jira/browse/SOLR-914
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.3
> Environment: Tomcat 6, JRE 6
>Reporter: Kay Kay
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-914.patch, SOLR-914.patch
>
>   Original Estimate: 480h
>  Remaining Estimate: 480h
>
> There seems to be a number of classes - that implement finalize() method.  
> Given that it is perfectly ok for a Java VM to not to call it - may be - 
> there has to some other way  { try .. finally - when they are created to 
> destroy them } to destroy them and the presence of finalize() method , ( 
> depending on implementation ) might not serve what we want and in some cases 
> can end up delaying the gc process, depending on the algorithms. 
> $ find . -name *.java | xargs grep finalize
> ./contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/JdbcDataSource.java:
>   protected void finalize() {
> ./src/java/org/apache/solr/update/SolrIndexWriter.java:  protected void 
> finalize() {
> ./src/java/org/apache/solr/core/CoreContainer.java:  protected void 
> finalize() {
> ./src/java/org/apache/solr/core/SolrCore.java:  protected void finalize() {
> ./src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:  protected 
> void finalize() throws Throwable {
> May be we need to revisit these occurences from a design perspective to see 
> if they are necessary / if there is an alternate way of managing guaranteed 
> destruction of resources. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1375) BloomFilter on a field

2009-08-20 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745774#action_12745774
 ] 

Lance Norskog edited comment on SOLR-1375 at 8/20/09 8:08 PM:
--

At my previous job, we were attempting to add the same document up to 100x per 
day. We used an MD5 signature for the id and made a bitmap file to pre-check 
ids before attempting to add them. Because we did not created a bitmap file 
with 2^32 bits (512M) instead of (2^128) we also had a false positive problem 
which we were willing to put up with. (It was ok if we did not add all 
documents. )

We also had the advantage that different feeding machines pulled documents from 
different sources, and so machine A's set of repeated documents was separate 
from machine B's. Therefore, each could keep its own bitmap file and the files 
could be OR'd together periodically in background. 

I can't recommend what we did. If you like the Bloom Filter for this problem, 
that's great. 

This project: [FastBits 
IBIS|http://crd.lbl.gov/~kewu/fastbit/doc/html/index.html] claims to be 
super-smart about compressing bits in a disk archive. It might be a better 
technology than the Nutch Bloom Filter, but there is no Java and the C is a 
different license.

I would counsel against making a central server ; Solr technologies should be 
distributed and localized (close to the Solr instance) as possible.






  was (Author: lancenorskog):
At my previous job, we were attempting to add the same document up to 100x 
per day. We used an MD5 signature for the id and made a bitmap file to 
pre-check ids before attempting to add them. Because we did not created a 
bitmap file with 2^32 bits (512M) instead of (2^128) we also had a false 
positive problem which we were willing to put up with. (It was ok if we did not 
add all documents. )

We also had the advantage that different feeding machines pulled documents from 
different sources, and so machine A's set of repeated documents was separate 
from machine B's. Therefore, each could keep its own bitmap file and the files 
could be OR'd together periodically in background. 

I can't recommend what we did. If you like the Bloom Filter for this problem, 
that's great. 

This project: [FastBits 
IBIS|http://crd.lbl.gov/~kewu/fastbit/doc/html/index.html] claims to be 
super-smart about compressing bits in a disk archive. It might be a better 
technology than the Nutch Bloom Filter, but who cares.





  
> BloomFilter on a field
> --
>
> Key: SOLR-1375
> URL: https://issues.apache.org/jira/browse/SOLR-1375
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1375.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> * A bloom filter is a read only probabilistic set. Its useful
> for verifying a key exists in a set, though it returns false
> positives. http://en.wikipedia.org/wiki/Bloom_filter 
> * The use case is indexing in Hadoop and checking for duplicates
> against a Solr cluster (which when using term dictionary or a
> query) is too slow and exceeds the time consumed for indexing.
> When a match is found, the host, segment, and term are returned.
> If the same term is found on multiple servers, multiple results
> are returned by the distributed process. (We'll need to add in
> the core name I just realized). 
> * When new segments are created, and commit is called, a new
> bloom filter is generated from a given field (default:id) by
> iterating over the term dictionary values. There's a bloom
> filter file per segment, which is managed on each Solr shard.
> When segments are merged away, their corresponding .blm files is
> also removed. In a future version we'll have a central server
> for the bloom filters so we're not abusing the thread pool of
> the Solr proxy and the networking of the Solr cluster (this will
> be done sooner than later after testing this version). I held
> off because the central server requires syncing the Solr
> servers' files (which is like reverse replication). 
> * The patch uses the BloomFilter from Hadoop 0.20. I want to jar
> up only the necessary classes so we don't have a giant Hadoop
> jar in lib.
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html
> * Distributed code is added and seems to work, I extended
> TestDistributedSearch to test over multiple HTTP servers. I
> chose this approach rather than the manual method used by (for
> example) TermVectorComponent.testDistributed because I'm new to
> Solr's distributed search and wanted to learn how it works (the
> stages are confusing). Using this method, I didn't need to setup
> multiple tomcat serve

[jira] Commented: (SOLR-1375) BloomFilter on a field

2009-08-20 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745774#action_12745774
 ] 

Lance Norskog commented on SOLR-1375:
-

At my previous job, we were attempting to add the same document up to 100x per 
day. We used an MD5 signature for the id and made a bitmap file to pre-check 
ids before attempting to add them. Because we did not created a bitmap file 
with 2^32 bits (512M) instead of (2^128) we also had a false positive problem 
which we were willing to put up with. (It was ok if we did not add all 
documents. )
We also had the advantage that different feeding machines pulled documents from 
different sources, and so machine A's set of repeated documents was separate 
from machine B's. Therefore, each could keep its own bitmap file and the files 
could be OR'd together periodically in background. 
I can't recommend what we did. If you like the Bloom Filter for this problem, 
that's great. 
This project: [FastBits 
IBIS|http://crd.lbl.gov/~kewu/fastbit/doc/html/index.html] claims to be 
super-smart about compressing bits in a disk archive. It might be a better 
technology than the Nutch Bloom Filter, but who cares.






> BloomFilter on a field
> --
>
> Key: SOLR-1375
> URL: https://issues.apache.org/jira/browse/SOLR-1375
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1375.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> * A bloom filter is a read only probabilistic set. Its useful
> for verifying a key exists in a set, though it returns false
> positives. http://en.wikipedia.org/wiki/Bloom_filter 
> * The use case is indexing in Hadoop and checking for duplicates
> against a Solr cluster (which when using term dictionary or a
> query) is too slow and exceeds the time consumed for indexing.
> When a match is found, the host, segment, and term are returned.
> If the same term is found on multiple servers, multiple results
> are returned by the distributed process. (We'll need to add in
> the core name I just realized). 
> * When new segments are created, and commit is called, a new
> bloom filter is generated from a given field (default:id) by
> iterating over the term dictionary values. There's a bloom
> filter file per segment, which is managed on each Solr shard.
> When segments are merged away, their corresponding .blm files is
> also removed. In a future version we'll have a central server
> for the bloom filters so we're not abusing the thread pool of
> the Solr proxy and the networking of the Solr cluster (this will
> be done sooner than later after testing this version). I held
> off because the central server requires syncing the Solr
> servers' files (which is like reverse replication). 
> * The patch uses the BloomFilter from Hadoop 0.20. I want to jar
> up only the necessary classes so we don't have a giant Hadoop
> jar in lib.
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html
> * Distributed code is added and seems to work, I extended
> TestDistributedSearch to test over multiple HTTP servers. I
> chose this approach rather than the manual method used by (for
> example) TermVectorComponent.testDistributed because I'm new to
> Solr's distributed search and wanted to learn how it works (the
> stages are confusing). Using this method, I didn't need to setup
> multiple tomcat servers and manually execute tests.
> * We need more of the bloom filter options passable via
> solrconfig
> * I'll add more test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1375) BloomFilter on a field

2009-08-20 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745774#action_12745774
 ] 

Lance Norskog edited comment on SOLR-1375 at 8/20/09 8:04 PM:
--

At my previous job, we were attempting to add the same document up to 100x per 
day. We used an MD5 signature for the id and made a bitmap file to pre-check 
ids before attempting to add them. Because we did not created a bitmap file 
with 2^32 bits (512M) instead of (2^128) we also had a false positive problem 
which we were willing to put up with. (It was ok if we did not add all 
documents. )

We also had the advantage that different feeding machines pulled documents from 
different sources, and so machine A's set of repeated documents was separate 
from machine B's. Therefore, each could keep its own bitmap file and the files 
could be OR'd together periodically in background. 

I can't recommend what we did. If you like the Bloom Filter for this problem, 
that's great. 

This project: [FastBits 
IBIS|http://crd.lbl.gov/~kewu/fastbit/doc/html/index.html] claims to be 
super-smart about compressing bits in a disk archive. It might be a better 
technology than the Nutch Bloom Filter, but who cares.






  was (Author: lancenorskog):
At my previous job, we were attempting to add the same document up to 100x 
per day. We used an MD5 signature for the id and made a bitmap file to 
pre-check ids before attempting to add them. Because we did not created a 
bitmap file with 2^32 bits (512M) instead of (2^128) we also had a false 
positive problem which we were willing to put up with. (It was ok if we did not 
add all documents. )
We also had the advantage that different feeding machines pulled documents from 
different sources, and so machine A's set of repeated documents was separate 
from machine B's. Therefore, each could keep its own bitmap file and the files 
could be OR'd together periodically in background. 
I can't recommend what we did. If you like the Bloom Filter for this problem, 
that's great. 
This project: [FastBits 
IBIS|http://crd.lbl.gov/~kewu/fastbit/doc/html/index.html] claims to be 
super-smart about compressing bits in a disk archive. It might be a better 
technology than the Nutch Bloom Filter, but who cares.





  
> BloomFilter on a field
> --
>
> Key: SOLR-1375
> URL: https://issues.apache.org/jira/browse/SOLR-1375
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1375.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> * A bloom filter is a read only probabilistic set. Its useful
> for verifying a key exists in a set, though it returns false
> positives. http://en.wikipedia.org/wiki/Bloom_filter 
> * The use case is indexing in Hadoop and checking for duplicates
> against a Solr cluster (which when using term dictionary or a
> query) is too slow and exceeds the time consumed for indexing.
> When a match is found, the host, segment, and term are returned.
> If the same term is found on multiple servers, multiple results
> are returned by the distributed process. (We'll need to add in
> the core name I just realized). 
> * When new segments are created, and commit is called, a new
> bloom filter is generated from a given field (default:id) by
> iterating over the term dictionary values. There's a bloom
> filter file per segment, which is managed on each Solr shard.
> When segments are merged away, their corresponding .blm files is
> also removed. In a future version we'll have a central server
> for the bloom filters so we're not abusing the thread pool of
> the Solr proxy and the networking of the Solr cluster (this will
> be done sooner than later after testing this version). I held
> off because the central server requires syncing the Solr
> servers' files (which is like reverse replication). 
> * The patch uses the BloomFilter from Hadoop 0.20. I want to jar
> up only the necessary classes so we don't have a giant Hadoop
> jar in lib.
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html
> * Distributed code is added and seems to work, I extended
> TestDistributedSearch to test over multiple HTTP servers. I
> chose this approach rather than the manual method used by (for
> example) TermVectorComponent.testDistributed because I'm new to
> Solr's distributed search and wanted to learn how it works (the
> stages are confusing). Using this method, I didn't need to setup
> multiple tomcat servers and manually execute tests.
> * We need more of the bloom filter options passable via
> solrconfig
> * I'll add more test cases

-- 
This message is automatically generated by JIRA.
-
You c

[jira] Issue Comment Edited: (SOLR-780) Enhance SpellCheckComponent to automatically build on firstSearcher in case of RAMDirectory based indices

2009-08-20 Thread Alex Baranov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745773#action_12745773
 ] 

Alex Baranov edited comment on SOLR-780 at 8/20/09 7:44 PM:


Attached patch with the fix.
Unit-test needs to be added.

  was (Author: alexb):
Attached path with the fix.
Unit-test needs to be added.
  
> Enhance SpellCheckComponent to automatically build on firstSearcher in case 
> of RAMDirectory based indices
> -
>
> Key: SOLR-780
> URL: https://issues.apache.org/jira/browse/SOLR-780
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
> Environment: Java
>Reporter: Oleg Gnatovskiy
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-780.patch
>
>
> The spellcheck compnent currently does not automatically reload when a 
> RAMDirectory is used. Is it possible to reload a RAMDirectory spell check 
> index using firstSearcher?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-780) Enhance SpellCheckComponent to automatically build on firstSearcher in case of RAMDirectory based indices

2009-08-20 Thread Alex Baranov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Baranov updated SOLR-780:
--

Attachment: SOLR-780.patch

Attached path with the fix.
Unit-test needs to be added.

> Enhance SpellCheckComponent to automatically build on firstSearcher in case 
> of RAMDirectory based indices
> -
>
> Key: SOLR-780
> URL: https://issues.apache.org/jira/browse/SOLR-780
> Project: Solr
>  Issue Type: Improvement
>  Components: spellchecker
> Environment: Java
>Reporter: Oleg Gnatovskiy
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-780.patch
>
>
> The spellcheck compnent currently does not automatically reload when a 
> RAMDirectory is used. Is it possible to reload a RAMDirectory spell check 
> index using firstSearcher?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Multiple facet.prefix?

2009-08-20 Thread Yonik Seeley
On Thu, Aug 20, 2009 at 10:13 PM, Avlesh Singh wrote:
> Or, may be just multiple facet.prefix values, Yonik; exactly the same as
> facet.field works.

The problem then becomes labels.
facet.field=foo goes under the label "foo"

We already have a mechanism to re-label results, seems like we should use that?

-Yonik
http://www.lucidimagination.com

> Cheers
> Avlesh
>
> On Fri, Aug 21, 2009 at 7:39 AM, Yonik Seeley 
> wrote:
>
>> On Thu, Aug 20, 2009 at 9:53 PM, Avlesh Singh wrote:
>> > Should multiple values for facet.prefix be supported?
>> > I have come across several use-cases on the user mailing list where such
>> a
>> > functionality could have helped (The latest one being -
>> >
>> http://www.lucidimagination.com/search/document/2a9c44d4f015b5e5/facet_filtering
>> ).
>> > I ran into one such use last night.
>> >
>> > Is there a general agreement on the enhancement?
>>
>> Yes, I think it's a good idea.
>> It's just that the current syntax doesn't quite support it yet.
>>
>> I think that once again, local params are the answer... the same as
>> they were with faceting on a single field but excluding different
>> filters.
>>
>> facet.field={!prefix=foo key=label1}myfield
>> facet.field={!prefix=bar key=label2}myfield
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>


Re: Multiple facet.prefix?

2009-08-20 Thread Avlesh Singh
Or, may be just multiple facet.prefix values, Yonik; exactly the same as
facet.field works.

Cheers
Avlesh

On Fri, Aug 21, 2009 at 7:39 AM, Yonik Seeley wrote:

> On Thu, Aug 20, 2009 at 9:53 PM, Avlesh Singh wrote:
> > Should multiple values for facet.prefix be supported?
> > I have come across several use-cases on the user mailing list where such
> a
> > functionality could have helped (The latest one being -
> >
> http://www.lucidimagination.com/search/document/2a9c44d4f015b5e5/facet_filtering
> ).
> > I ran into one such use last night.
> >
> > Is there a general agreement on the enhancement?
>
> Yes, I think it's a good idea.
> It's just that the current syntax doesn't quite support it yet.
>
> I think that once again, local params are the answer... the same as
> they were with faceting on a single field but excluding different
> filters.
>
> facet.field={!prefix=foo key=label1}myfield
> facet.field={!prefix=bar key=label2}myfield
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Multiple facet.prefix?

2009-08-20 Thread Yonik Seeley
On Thu, Aug 20, 2009 at 9:53 PM, Avlesh Singh wrote:
> Should multiple values for facet.prefix be supported?
> I have come across several use-cases on the user mailing list where such a
> functionality could have helped (The latest one being -
> http://www.lucidimagination.com/search/document/2a9c44d4f015b5e5/facet_filtering).
> I ran into one such use last night.
>
> Is there a general agreement on the enhancement?

Yes, I think it's a good idea.
It's just that the current syntax doesn't quite support it yet.

I think that once again, local params are the answer... the same as
they were with faceting on a single field but excluding different
filters.

facet.field={!prefix=foo key=label1}myfield
facet.field={!prefix=bar key=label2}myfield

-Yonik
http://www.lucidimagination.com


Multiple facet.prefix?

2009-08-20 Thread Avlesh Singh
Should multiple values for facet.prefix be supported?
I have come across several use-cases on the user mailing list where such a
functionality could have helped (The latest one being -
http://www.lucidimagination.com/search/document/2a9c44d4f015b5e5/facet_filtering).
I ran into one such use last night.

Is there a general agreement on the enhancement?

Cheers
Avlesh


[jira] Commented: (SOLR-1376) invalid links to solr indexes after a new index is created

2009-08-20 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745766#action_12745766
 ] 

Hoss Man commented on SOLR-1376:


w/o more info, there is no actual evidence of a bug.

Lucene/Solr keeps open refrences to deleted files all the time as part of 
normal searching -- that's how a consistent view of the index is presented to 
all users of an IndexSearcher while background updates (which may result in 
merged segments, which results in deleted files) are taking place.

triggering a commit closes the old searcher and opens a new one, when the old 
searcher closes the handles to the (deleted) files are closed.

you didn't mention wether you were doing any commits -- if you have evidence 
that the list of open but deleted files grows over time even as commits happen 
then there's a potential problem, but since i can't reproduce that we'll need 
more specifics about your environment, and what exactly you mean by "do a 
incremental indexing"

> invalid links to solr indexes after a new index is created
> --
>
> Key: SOLR-1376
> URL: https://issues.apache.org/jira/browse/SOLR-1376
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: kiran sugana
> Fix For: 1.4
>
>
> After new index is created, it does not delete the links to the old indexes, 
> To recreate the issue, 
> 1) do a incremental indexing 
> 2) cd /proc/[JAVA_PID]/fd
> 3) ls -la
> {code}
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 75 -> 
> /home//solrhome/data/index/_kja.fdx (deleted)
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 76 -> 
> /home/./solrhome/data/index/_kk4.tis (deleted)
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 78 -> 
> /home//solrhome/data/index/_kk4.frq (deleted)
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 79 -> 
> /home//solrhome/data/index/_kk4.prx (deleted)
> {code}
> This is creating performance issues, (search slows down significantly) 
> Temp Resolution:
>  Restart solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1376) invalid links to solr indexes after a new index is created

2009-08-20 Thread kiran sugana (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kiran sugana updated SOLR-1376:
---

Description: 
After new index is created, it does not delete the links to the old indexes, 

To recreate the issue, 
1) do a incremental indexing 
2) cd /proc/[JAVA_PID]/fd
3) ls -la
{code}
lr-x-- 1 solr roleusers 64 Jul 23 17:31 75 -> 
/home//solrhome/data/index/_kja.fdx (deleted)
lr-x-- 1 solr roleusers 64 Jul 23 17:31 76 -> 
/home/./solrhome/data/index/_kk4.tis (deleted)
lr-x-- 1 solr roleusers 64 Jul 23 17:31 78 -> 
/home//solrhome/data/index/_kk4.frq (deleted)
lr-x-- 1 solr roleusers 64 Jul 23 17:31 79 -> 
/home//solrhome/data/index/_kk4.prx (deleted)
{code}

This is creating performance issues, (search slows down significantly) 

Temp Resolution:
 Restart solr



  was:
After new index is created, it does not delete the links to the old indexes, 

To recreate the issue, 
1) do a incremental indexing 
2) cd /proc/[java pid/fd
3) ls -la
{code}
lr-x-- 1 solr roleusers 64 Jul 23 17:31 75 -> 
/home//solrhome/data/index/_kja.fdx (deleted)
lr-x-- 1 solr roleusers 64 Jul 23 17:31 76 -> 
/home/./solrhome/data/index/_kk4.tis (deleted)
lr-x-- 1 solr roleusers 64 Jul 23 17:31 78 -> 
/home//solrhome/data/index/_kk4.frq (deleted)
lr-x-- 1 solr roleusers 64 Jul 23 17:31 79 -> 
/home//solrhome/data/index/_kk4.prx (deleted)
{code}

This is creating performance issues, (search slows down significantly) 

Temp Resolution:
 Restart solr




> invalid links to solr indexes after a new index is created
> --
>
> Key: SOLR-1376
> URL: https://issues.apache.org/jira/browse/SOLR-1376
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: kiran sugana
> Fix For: 1.4
>
>
> After new index is created, it does not delete the links to the old indexes, 
> To recreate the issue, 
> 1) do a incremental indexing 
> 2) cd /proc/[JAVA_PID]/fd
> 3) ls -la
> {code}
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 75 -> 
> /home//solrhome/data/index/_kja.fdx (deleted)
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 76 -> 
> /home/./solrhome/data/index/_kk4.tis (deleted)
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 78 -> 
> /home//solrhome/data/index/_kk4.frq (deleted)
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 79 -> 
> /home//solrhome/data/index/_kk4.prx (deleted)
> {code}
> This is creating performance issues, (search slows down significantly) 
> Temp Resolution:
>  Restart solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1376) invalid links to solr indexes after a new index is created

2009-08-20 Thread kiran sugana (JIRA)
invalid links to solr indexes after a new index is created
--

 Key: SOLR-1376
 URL: https://issues.apache.org/jira/browse/SOLR-1376
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.3
Reporter: kiran sugana
 Fix For: 1.4


After new index is created, it does not delete the links to the old indexes, 

To recreate the issue, 
1) do a incremental indexing 
2) cd /proc/[java pid/fd
3) ls -la
{code}
lr-x-- 1 solr roleusers 64 Jul 23 17:31 75 -> 
/home//solrhome/data/index/_kja.fdx (deleted)
lr-x-- 1 solr roleusers 64 Jul 23 17:31 76 -> 
/home/./solrhome/data/index/_kk4.tis (deleted)
lr-x-- 1 solr roleusers 64 Jul 23 17:31 78 -> 
/home//solrhome/data/index/_kk4.frq (deleted)
lr-x-- 1 solr roleusers 64 Jul 23 17:31 79 -> 
/home//solrhome/data/index/_kk4.prx (deleted)
{code}

This is creating performance issues, (search slows down significantly) 

Temp Resolution:
 Restart solr



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1375) BloomFilter on a field

2009-08-20 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1375:
---

Attachment: SOLR-1375.patch

Patch file

> BloomFilter on a field
> --
>
> Key: SOLR-1375
> URL: https://issues.apache.org/jira/browse/SOLR-1375
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1375.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> * A bloom filter is a read only probabilistic set. Its useful
> for verifying a key exists in a set, though it returns false
> positives. http://en.wikipedia.org/wiki/Bloom_filter 
> * The use case is indexing in Hadoop and checking for duplicates
> against a Solr cluster (which when using term dictionary or a
> query) is too slow and exceeds the time consumed for indexing.
> When a match is found, the host, segment, and term are returned.
> If the same term is found on multiple servers, multiple results
> are returned by the distributed process. (We'll need to add in
> the core name I just realized). 
> * When new segments are created, and commit is called, a new
> bloom filter is generated from a given field (default:id) by
> iterating over the term dictionary values. There's a bloom
> filter file per segment, which is managed on each Solr shard.
> When segments are merged away, their corresponding .blm files is
> also removed. In a future version we'll have a central server
> for the bloom filters so we're not abusing the thread pool of
> the Solr proxy and the networking of the Solr cluster (this will
> be done sooner than later after testing this version). I held
> off because the central server requires syncing the Solr
> servers' files (which is like reverse replication). 
> * The patch uses the BloomFilter from Hadoop 0.20. I want to jar
> up only the necessary classes so we don't have a giant Hadoop
> jar in lib.
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html
> * Distributed code is added and seems to work, I extended
> TestDistributedSearch to test over multiple HTTP servers. I
> chose this approach rather than the manual method used by (for
> example) TermVectorComponent.testDistributed because I'm new to
> Solr's distributed search and wanted to learn how it works (the
> stages are confusing). Using this method, I didn't need to setup
> multiple tomcat servers and manually execute tests.
> * We need more of the bloom filter options passable via
> solrconfig
> * I'll add more test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1375) BloomFilter on a field

2009-08-20 Thread Jason Rutherglen (JIRA)
BloomFilter on a field
--

 Key: SOLR-1375
 URL: https://issues.apache.org/jira/browse/SOLR-1375
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


* A bloom filter is a read only probabilistic set. Its useful
for verifying a key exists in a set, though it returns false
positives. http://en.wikipedia.org/wiki/Bloom_filter 

* The use case is indexing in Hadoop and checking for duplicates
against a Solr cluster (which when using term dictionary or a
query) is too slow and exceeds the time consumed for indexing.
When a match is found, the host, segment, and term are returned.
If the same term is found on multiple servers, multiple results
are returned by the distributed process. (We'll need to add in
the core name I just realized). 

* When new segments are created, and commit is called, a new
bloom filter is generated from a given field (default:id) by
iterating over the term dictionary values. There's a bloom
filter file per segment, which is managed on each Solr shard.
When segments are merged away, their corresponding .blm files is
also removed. In a future version we'll have a central server
for the bloom filters so we're not abusing the thread pool of
the Solr proxy and the networking of the Solr cluster (this will
be done sooner than later after testing this version). I held
off because the central server requires syncing the Solr
servers' files (which is like reverse replication). 

* The patch uses the BloomFilter from Hadoop 0.20. I want to jar
up only the necessary classes so we don't have a giant Hadoop
jar in lib.
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html

* Distributed code is added and seems to work, I extended
TestDistributedSearch to test over multiple HTTP servers. I
chose this approach rather than the manual method used by (for
example) TermVectorComponent.testDistributed because I'm new to
Solr's distributed search and wanted to learn how it works (the
stages are confusing). Using this method, I didn't need to setup
multiple tomcat servers and manually execute tests.

* We need more of the bloom filter options passable via
solrconfig

* I'll add more test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1374) When a test fails, display the test file in the console via ant junit

2009-08-20 Thread Jason Rutherglen (JIRA)
When a test fails, display the test file in the console via ant junit
-

 Key: SOLR-1374
 URL: https://issues.apache.org/jira/browse/SOLR-1374
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5


When a test fails, it would be great if the junit test output file were 
displayed in the terminal.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-20 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745731#action_12745731
 ] 

Jason Rutherglen commented on SOLR-1275:


I'll check in a new patch that's faster (i.e. indexes fewer docs).

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch, SOLR-1275.patch, SOLR-1275.patch, 
> SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-20 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745717#action_12745717
 ] 

Jason Rutherglen commented on SOLR-1275:


> Isn't it much simpler 

Calling SR.undelete would remove the deletes and the test would
pass?

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch, SOLR-1275.patch, SOLR-1275.patch, 
> SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Issue with Function Query on Field Names with a Hyphen

2009-08-20 Thread Chris Hostetter

: So, it looks like it is trying to parse the "-" in the field name as an
: operator instead of as part of the field name.  Is it bad form to include a
: hyphen in a field name... I've never had issues with it anywhere else in the
: past.  FYI, I've confirmed this behavior on 1.4 nightly builds from both
: 6/16 and 8/6 and both forms of the function query syntax ({!func} and
: _val_).  Also, I have this problem with seemingly all function queries (ord,
: sum, etc.), not just "product."

At the lowest level, Solr Let you have any characters you want in 
the field names in the index (even whitespace and unprintable characters) 
but in practice some features don't work well with some characters in 
fieldnames because of the syntax -- ie: the sort param doesn't work well 
with fields that have whitespace or commas in their names because the sort 
syntax uses whitespace and comma characters.

: Is there any way to escape the field name, or is this just a bug?

I'm a little suprised that the function syntax would freak out about a 
dash in a field name ... off the top of my head i can't think of any 
reason why the syntax would split on a dash like that when looking for a 
field name (comma in a fieldname would definitely cause a problem) ... i 
would go ahead and file a bug, but it might be a neccessarily limitation 
that i'm just not aware of.


-Hoss



[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745695#action_12745695
 ] 

Yonik Seeley commented on SOLR-1275:


Isn't it much simpler to just check that the segments have no deletes after 
expungeDeletes is called?
Is there something my proposed patch doesn't test that you think it should?

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch, SOLR-1275.patch, SOLR-1275.patch, 
> SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1368) ms() function for date math

2009-08-20 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1368.


Resolution: Fixed

> ms() function for date math
> ---
>
> Key: SOLR-1368
> URL: https://issues.apache.org/jira/browse/SOLR-1368
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 1.4
>
> Attachments: SOLR-1368.patch, SOLR-1368.patch
>
>
> ms (milliseconds) function to use dates in function queries

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1373) Add filter query in solr/admin/form.jsp

2009-08-20 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745684#action_12745684
 ] 

Lance Norskog commented on SOLR-1373:
-

Maybe there should be two 'Full Interface' pages? 

The current Full Interface page has the "handing a bomb to a baby" problem for 
very large indexes. For example, some things like sorting or faceting can kill 
a large index. Maybe the super-full page would include a default of 5 seconds 
for the solr timeout?


> Add filter query in solr/admin/form.jsp
> ---
>
> Key: SOLR-1373
> URL: https://issues.apache.org/jira/browse/SOLR-1373
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Hoss Man
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1373.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The full interface needs a filter query text field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (SOLR-1366) UnsupportedOperationException may be thrown when using custom IndexReader

2009-08-20 Thread Chris Hostetter

: Shalin Shekhar Mangar updated SOLR-1366:
: 
: 
: Component/s: replication (java)

the issue seems broader then just replication ... i would change this back 
to a generic "search" component, and open new related issue(s) for 
replication (documentation vs custom reader support) ... some pieces of 
this may make it into 1.4 and some may not, so we'll want to track 
seperately.



-Hoss



[jira] Commented: (SOLR-1335) load core properties from a properties file

2009-08-20 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745680#action_12745680
 ] 

Lance Norskog commented on SOLR-1335:
-

About use cases for this feature:

I would like to use this along with my strange baby, 
[SOLR-1354|http://issues.apache.org/jira/browse/SOLR-1354]. It would allow me 
to push all parameters for an RSS/ATOM feed into a separate configuration file. 
This way, to add an rss feed to a Solr instance requires editing a properties 
file and nothing else. (The larger goal is here to make it as easy as possible 
to make solr useful out of the box.)

Another place where properties files would be very useful is in DIH scripts. 
When we want to load multiple shards from the same data source, we need 
different code for each shard. It would be great to have one master DIH file 
and a different properties file for each shard. Each properties file has a 
unique value to define the records for that shard.




> load core properties from a properties file
> ---
>
> Key: SOLR-1335
> URL: https://issues.apache.org/jira/browse/SOLR-1335
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1335.patch, SOLR-1335.patch, SOLR-1335.patch
>
>
> There are  few ways of loading properties in runtime,
> # using env property using in the command line
> # if you use a multicore drop it in the solr.xml
> if not , the only way is to  keep separate solrconfig.xml for each instance.  
> #1 is error prone if the user fails to start with the correct system 
> property. 
> In our case we have four different configurations for the same deployment  . 
> And we have to disable replication of solrconfig.xml. 
> It would be nice if I can distribute four properties file so that our ops can 
> drop  the right one and start Solr. Or it is possible for the operations to 
> edit a properties file  but it is risky to edit solrconfig.xml if he does not 
> understand solr
> I propose a properties file in the instancedir as solrcore.properties . If 
> present would be loaded and added as core specific properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1335) load core properties from a properties file

2009-08-20 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745678#action_12745678
 ] 

Lance Norskog commented on SOLR-1335:
-

About the features for subsituting properties file:

I have run multiple Solr instances (servlets) in the same container. Yes, 
multicore is the better way, but we shoud not force the user to have only one 
Solr per Tomcat .  So, we should not force only one properties file via 
System.properties.

I would ask that if a configuration file uses a properties file, that 
configuration file should have the ability to name its own properties file. For 
example, solrconfig.xml should have its own entry for adding properties files. 
But, if solrconfig.xml names a file the solr.xml should be able to override 
that file name. To do this, the properties files should be named. 

In conf/query_server.properties:
{code}
fq.size=400
{code}
In foo/conf/solrconfig.xml:
{code:xml}
conf/query_server.properties
{code}
Later in solrconfig.xml:
{code:xml}

{code}

Then solr.xml can override the query_servers properties file. In solr.xml:
{code:xml}

${core}/conf/query_server_mini.properties

{code}

This just gets worse and worse :)  


> load core properties from a properties file
> ---
>
> Key: SOLR-1335
> URL: https://issues.apache.org/jira/browse/SOLR-1335
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1335.patch, SOLR-1335.patch, SOLR-1335.patch
>
>
> There are  few ways of loading properties in runtime,
> # using env property using in the command line
> # if you use a multicore drop it in the solr.xml
> if not , the only way is to  keep separate solrconfig.xml for each instance.  
> #1 is error prone if the user fails to start with the correct system 
> property. 
> In our case we have four different configurations for the same deployment  . 
> And we have to disable replication of solrconfig.xml. 
> It would be nice if I can distribute four properties file so that our ops can 
> drop  the right one and start Solr. Or it is possible for the operations to 
> edit a properties file  but it is risky to edit solrconfig.xml if he does not 
> understand solr
> I propose a properties file in the instancedir as solrcore.properties . If 
> present would be loaded and added as core specific properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1373) Add filter query in solr/admin/form.jsp

2009-08-20 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745675#action_12745675
 ] 

Hoss Man commented on SOLR-1373:


bq. Anything that's commonly used should be in Solr's equivalent of advanced 
search

Agreed ... but there is a risk that some "common" options aren't ubiqutious, so 
if we add them to the form people could get confused if a param doesn't work 
because the request handler they are using doesn't support it

in the case of fq, i think the pros outway the cons ... that's why i committed 
your patch.

bq. I'm curious why we're not interested in making a more advanced UI?

i don't remember anyone saying that ... i just don't think hardcoding a bunch 
of params into form.jsp is the way to go.  i'd love to see a good UI that could 
inspect the solr conifuration and display the appropriate options in a GUI -- 
ala the wiki page i mentioned (but aparently didn't link to) ...

http://wiki.apache.org/solr/MakeSolrMoreSelfService

> Add filter query in solr/admin/form.jsp
> ---
>
> Key: SOLR-1373
> URL: https://issues.apache.org/jira/browse/SOLR-1373
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Hoss Man
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1373.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The full interface needs a filter query text field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1373) Add filter query in solr/admin/form.jsp

2009-08-20 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745646#action_12745646
 ] 

Jason Rutherglen commented on SOLR-1373:


Anything that's commonly used should be in Solr's equivalent of
advanced search. Users ask me what is a filter query, they don't
know exists, and end up writing clauses that should be filters
in the query section. Putting fq in the form.jsp will hopefully
help. 

I'm curious why we're not interested in making a more advanced
UI?

> Add filter query in solr/admin/form.jsp
> ---
>
> Key: SOLR-1373
> URL: https://issues.apache.org/jira/browse/SOLR-1373
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Hoss Man
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1373.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The full interface needs a filter query text field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-20 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745643#action_12745643
 ] 

Jason Rutherglen commented on SOLR-1275:


I like using XML, I was wondering why in
TermVectorComponentTest.testDistributed we're manually creating
the distributed part? I'm currently picking art
TestDistributedSearch to test out a distributed component (a bit
of work). 

expungeDeletes is supposed to remove segments that contains
deletes. The patch posted does this by checking the segment
names. This will work fine with NRT.

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch, SOLR-1275.patch, SOLR-1275.patch, 
> SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: svn commit: r804477 - in /lucene/solr/trunk/example/exampledocs: hd.xml mem.xml sd500.xml vidcard.xml

2009-08-20 Thread Chris Hostetter

: OK - any suggestions on what field to use for our example docs?  It

i changed it to be manufacturedate_dt since that fits with the existing 
scheme ... the data is all made up, but so is all hte rest of our data.


-Hoss



[jira] Updated: (SOLR-914) Presence of finalize() in the codebase

2009-08-20 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-914:
--

Attachment: SOLR-914.patch

revised patch with the improvements i mentioned earlier, also fixes a cut/paste 
mistake in one of the log messages.

> Presence of finalize() in the codebase 
> ---
>
> Key: SOLR-914
> URL: https://issues.apache.org/jira/browse/SOLR-914
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 1.3
> Environment: Tomcat 6, JRE 6
>Reporter: Kay Kay
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-914.patch, SOLR-914.patch
>
>   Original Estimate: 480h
>  Remaining Estimate: 480h
>
> There seems to be a number of classes - that implement finalize() method.  
> Given that it is perfectly ok for a Java VM to not to call it - may be - 
> there has to some other way  { try .. finally - when they are created to 
> destroy them } to destroy them and the presence of finalize() method , ( 
> depending on implementation ) might not serve what we want and in some cases 
> can end up delaying the gc process, depending on the algorithms. 
> $ find . -name *.java | xargs grep finalize
> ./contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/JdbcDataSource.java:
>   protected void finalize() {
> ./src/java/org/apache/solr/update/SolrIndexWriter.java:  protected void 
> finalize() {
> ./src/java/org/apache/solr/core/CoreContainer.java:  protected void 
> finalize() {
> ./src/java/org/apache/solr/core/SolrCore.java:  protected void finalize() {
> ./src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:  protected 
> void finalize() throws Throwable {
> May be we need to revisit these occurences from a design perspective to see 
> if they are necessary / if there is an alternate way of managing guaranteed 
> destruction of resources. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1335) load core properties from a properties file

2009-08-20 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745602#action_12745602
 ] 

Erik Hatcher commented on SOLR-1335:


Along the same lines as making master/searcher determination through 
properties, it would be nice to be able to conditionally enable/disable, say, 
/update handler by some deploy-time switch.Noble - does it make sense to 
consider this type of use here?

> load core properties from a properties file
> ---
>
> Key: SOLR-1335
> URL: https://issues.apache.org/jira/browse/SOLR-1335
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1335.patch, SOLR-1335.patch, SOLR-1335.patch
>
>
> There are  few ways of loading properties in runtime,
> # using env property using in the command line
> # if you use a multicore drop it in the solr.xml
> if not , the only way is to  keep separate solrconfig.xml for each instance.  
> #1 is error prone if the user fails to start with the correct system 
> property. 
> In our case we have four different configurations for the same deployment  . 
> And we have to disable replication of solrconfig.xml. 
> It would be nice if I can distribute four properties file so that our ops can 
> drop  the right one and start Solr. Or it is possible for the operations to 
> edit a properties file  but it is risky to edit solrconfig.xml if he does not 
> understand solr
> I propose a properties file in the instancedir as solrcore.properties . If 
> present would be loaded and added as core specific properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1373) Add filter query in solr/admin/form.jsp

2009-08-20 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745582#action_12745582
 ] 

Erik Hatcher commented on SOLR-1373:


Yeah, I kinda cringed with this one myself, but it's a basic one that seems 
reasonable to add... but then so could a defType selector, etc.  Now we're 
talking Solr Explorer and the realm of front-end frameworks like Flare, 
Solritas, and others popping up. 

> Add filter query in solr/admin/form.jsp
> ---
>
> Key: SOLR-1373
> URL: https://issues.apache.org/jira/browse/SOLR-1373
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Hoss Man
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1373.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The full interface needs a filter query text field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1373) Add filter query in solr/admin/form.jsp

2009-08-20 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1373.


Resolution: Fixed
  Assignee: Hoss Man

I'm generally opposed to a proliferation of params on the admin search form 
(unless we add a way for them to be configured ala MakeSolrMoreSelfService) but 
fq is at least as ubiquitous as highlighting at this point.

Committed revision 806303.


> Add filter query in solr/admin/form.jsp
> ---
>
> Key: SOLR-1373
> URL: https://issues.apache.org/jira/browse/SOLR-1373
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Hoss Man
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1373.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The full interface needs a filter query text field.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1371) LukeRequestHandler/schema.jsp errors if schema has no uniqueKey field

2009-08-20 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-1371.


Resolution: Fixed

Committed revision 806289.


> LukeRequestHandler/schema.jsp errors if schema has no uniqueKey field
> -
>
> Key: SOLR-1371
> URL: https://issues.apache.org/jira/browse/SOLR-1371
> Project: Solr
>  Issue Type: Bug
>  Components: web gui
>Affects Versions: 1.3
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 1.4
>
> Attachments: SOLR-1371.patch
>
>
> if a schema doesn't have a uniqueKey field specified, teh schema explorer 
> won't work, and logs this exception...
> {code}
> SEVERE: java.lang.NullPointerException
> at 
> org.apache.solr.handler.admin.LukeRequestHandler.getSchemaInfo(LukeRequestHandler.java:373)
> at 
> org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:133)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Fwd: [Solr Wiki] Trivial Update of "HowToRelease" by YonikSeeley

2009-08-20 Thread Chris Hostetter

: > + This page is to help a Solr committer create a new release (you need
: > committer rights for some of the steps to create an official release).  It
: > does not reflect official release policy - many of the items may be
: > optional, or may be modified as necessary.
: > - ''This page is prepared for Solr committers. You need committer rights
: > - to create a new  Solr release.''
: 
: I really don't think this is a good idea.  What gets released and how it gets
: released should not be up to the RM.  We as a community have agreed to support
: the artifacts we produce.  One individual should not then get to undermine
: that b/c they don't have a particular use for some particular artifact or
: release step.

This feels like a disagreement of semantics...

the community helps shape the release guidelines via the wiki, but 
strictly speaking yonik is correct: it's not formal policy (the PMC didn't 
vote on it).  The RM can follow those guidelines as tightly/loosely as 
they choose to make an RC but they have to put it to a vote of the PMC 
before it is truely a release -- If the PMC feels community concencus was 
not followed closely enough by the RM, the release won't get the votes it 
needs.


-Hoss



[jira] Commented: (SOLR-1362) WordDelimiterFilter position increment bug

2009-08-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745541#action_12745541
 ] 

Yonik Seeley commented on SOLR-1362:


bq. I assume it should be as an option for back compat?

I guess so, yes... there may be configurations where it makes sense.


> WordDelimiterFilter position increment bug
> --
>
> Key: SOLR-1362
> URL: https://issues.apache.org/jira/browse/SOLR-1362
> Project: Solr
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
>Priority: Minor
> Attachments: SOLR-1362.patch
>
>
> WordDelimiterFilter sometimes assigns high position increment values, which 
> inhibits phrase matches.
> If this is a feature and not a bug please change the issue type, and I will 
> change the patch to propose this as an option...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1362) WordDelimiterFilter position increment bug

2009-08-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745520#action_12745520
 ] 

Robert Muir commented on SOLR-1362:
---

ah, i see your point... 

sounds right to me. i can reformulate the patch/tests in this direction.

I assume it should be as an option for back compat?

> WordDelimiterFilter position increment bug
> --
>
> Key: SOLR-1362
> URL: https://issues.apache.org/jira/browse/SOLR-1362
> Project: Solr
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
>Priority: Minor
> Attachments: SOLR-1362.patch
>
>
> WordDelimiterFilter sometimes assigns high position increment values, which 
> inhibits phrase matches.
> If this is a feature and not a bug please change the issue type, and I will 
> change the patch to propose this as an option...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1362) WordDelimiterFilter position increment bug

2009-08-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745516#action_12745516
 ] 

Yonik Seeley commented on SOLR-1362:


bq. Yonik, in this case I think existing gaps would be preserved with the =

What if the big position increment was on a token that was all delimiters?

I agree that it makes more sense for "LUCENE / SOLR" to be translated to LUCENE 
SOLR without a gap though (provided that there are no gaps to start with).

Should the rule be, subtract 1 from the cumulative position increment if the 
increment of the current token being added is >=1 ?


> WordDelimiterFilter position increment bug
> --
>
> Key: SOLR-1362
> URL: https://issues.apache.org/jira/browse/SOLR-1362
> Project: Solr
>  Issue Type: Bug
>  Components: Analysis
>Reporter: Robert Muir
>Priority: Minor
> Attachments: SOLR-1362.patch
>
>
> WordDelimiterFilter sometimes assigns high position increment values, which 
> inhibits phrase matches.
> If this is a feature and not a bug please change the issue type, and I will 
> change the patch to propose this as an option...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr 1.4 Work

2009-08-20 Thread Grant Ingersoll

Here's another interesting view of Solr issues:  
https://issues.apache.org/jira/browse/SOLR?report=com.atlassian.jira.plugin.system.project:popularissues-panel

Most popular.  Field collapsing is #1 by more than twice the next  
one.  I'm not proposing it for 1.4, but it certainly seems like we  
need to make sure it is in 1.5.


On Aug 18, 2009, at 2:37 PM, Grant Ingersoll wrote:


https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230&versionId=12313351&showOpenIssuesOnly=true

Still 20ish issues to be worked on.





Deleting a field at runtime

2009-08-20 Thread KishoreVeleti CoreObjects

Hi All,

Just completed an interview on SOLR - one of the question was "is it
possible to remove a field from existing index". I am not sure what is the
business use case here. 

My understanding is it is not possible. Still wanted to know from SOLR
experts, is it possible to remove a field from an existing index?

Thanks in Advance,
Kishore Veleti A.V.K.
-- 
View this message in context: 
http://www.nabble.com/Deleting-a-field-at-runtime-tp25066329p25066329.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-20 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745468#action_12745468
 ] 

Yonik Seeley commented on SOLR-1275:


Hmmm, it's harder to see in diff format... the test just boils down to this:

{code}
  public void testExpungeDeletes() throws Exception {
assertU(adoc("id","1"));
assertU(adoc("id","2"));
assertU(commit());

assertU(adoc("id","3"));
assertU(adoc("id","2"));
assertU(adoc("id","4"));
assertU(commit());

SolrQueryRequest sr = req("q","foo");
SolrIndexReader r = sr.getSearcher().getReader();
assertTrue(r.maxDoc() > r.numDocs());   // should have deletions
assertTrue(r.getLeafReaders().length > 1);  // more than 1 segment
sr.close();

assertU(commit("expungeDeletes","true"));

sr = req("q","foo");
r = sr.getSearcher().getReader();
assertEquals(r.maxDoc(), r.numDocs());  // no deletions
assertTrue(r.getLeafReaders().length > 1);  // still more than 1 segment
sr.close();
  }
{code}

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch, SOLR-1275.patch, SOLR-1275.patch, 
> SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-20 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1275:
---

Attachment: SOLR-1275.patch

How about this much simpler test that's faster to execute, and should be very 
easy to fix if/when behavior changes in the future (which it certainly could 
with NRT stuff).  It also tests more of the complete loop too by going through 
the XML update command.

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch, SOLR-1275.patch, SOLR-1275.patch, 
> SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1335) load core properties from a properties file

2009-08-20 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745373#action_12745373
 ] 

Noble Paul commented on SOLR-1335:
--

bq.If it's something we know we're going to want to do, and it's going to keep 
the code simpler in the long run, we might as well do it right the first time.

I'll propose this. Anyway we are planing to move to a stage where we will use 
solr.xml for single core as well. So I shall add the configuration of the 
properties in the  tag as follows
{code:xml}

{code}

For single core , let us fix a file name . So when we introduce solr.xml for 
single core it becomes automatically configurable 

> load core properties from a properties file
> ---
>
> Key: SOLR-1335
> URL: https://issues.apache.org/jira/browse/SOLR-1335
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1335.patch, SOLR-1335.patch, SOLR-1335.patch
>
>
> There are  few ways of loading properties in runtime,
> # using env property using in the command line
> # if you use a multicore drop it in the solr.xml
> if not , the only way is to  keep separate solrconfig.xml for each instance.  
> #1 is error prone if the user fails to start with the correct system 
> property. 
> In our case we have four different configurations for the same deployment  . 
> And we have to disable replication of solrconfig.xml. 
> It would be nice if I can distribute four properties file so that our ops can 
> drop  the right one and start Solr. Or it is possible for the operations to 
> edit a properties file  but it is risky to edit solrconfig.xml if he does not 
> understand solr
> I propose a properties file in the instancedir as solrcore.properties . If 
> present would be loaded and added as core specific properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1330) the details command shows current replication status when no replication is going on

2009-08-20 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1330:
-

Attachment: SOLR-1330.patch

simplified  and removed a few thread-unsafe operations

> the details command shows current replication status when no replication is 
> going on
> 
>
> Key: SOLR-1330
> URL: https://issues.apache.org/jira/browse/SOLR-1330
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-1330.patch, SOLR-1330.patch, SOLR-1330.patch, 
> SOLR-1330.patch, SOLR-1330.patch, SOLR-1330.patch
>
>
> The details of current replication should be shown only when a replication is 
> going on. It would also be useful if the history of past replications are 
> also captured

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1330) the details command shows current replication status when no replication is going on

2009-08-20 Thread Akshay K. Ukey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay K. Ukey updated SOLR-1330:
-

Attachment: SOLR-1330.patch

Same patch as previous one, in sync with trunk.

> the details command shows current replication status when no replication is 
> going on
> 
>
> Key: SOLR-1330
> URL: https://issues.apache.org/jira/browse/SOLR-1330
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-1330.patch, SOLR-1330.patch, SOLR-1330.patch, 
> SOLR-1330.patch, SOLR-1330.patch
>
>
> The details of current replication should be shown only when a replication is 
> going on. It would also be useful if the history of past replications are 
> also captured

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.