[jira] Updated: (SOLR-561) Solr replication by Solr (for windows also)

2008-10-20 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-561:
---

Attachment: SOLR-561.patch

Updated patch with a couple of bug fixes related to closing connections and 
refcounted index searcher. Other cosmetic changes include code formatting and 
javadocs.

Noble has put up a wiki page at http://wiki.apache.org/solr/SolrReplication 
detailing the features and configuration.

> Solr replication by Solr (for windows also)
> ---
>
> Key: SOLR-561
> URL: https://issues.apache.org/jira/browse/SOLR-561
> Project: Solr
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 1.4
> Environment: All
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: deletion_policy.patch, SOLR-561-core.patch, 
> SOLR-561-full.patch, SOLR-561-full.patch, SOLR-561-full.patch, 
> SOLR-561-full.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
> SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch
>
>
> The current replication strategy in solr involves shell scripts . The 
> following are the drawbacks with the approach
> *  It does not work with windows
> * Replication works as a separate piece not integrated with solr.
> * Cannot control replication from solr admin/JMX
> * Each operation requires manual telnet to the host
> Doing the replication in java has the following advantages
> * Platform independence
> * Manual steps can be completely eliminated. Everything can be driven from 
> solrconfig.xml .
> ** Adding the url of the master in the slaves should be good enough to enable 
> replication. Other things like frequency of
> snapshoot/snappull can also be configured . All other information can be 
> automatically obtained.
> * Start/stop can be triggered from solr/admin or JMX
> * Can get the status/progress while replication is going on. It can also 
> abort an ongoing replication
> * No need to have a login into the machine 
> * From a development perspective, we can unit test it
> This issue can track the implementation of solr replication in java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-561) Solr replication by Solr (for windows also)

2008-10-20 Thread Akshay K. Ukey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay K. Ukey updated SOLR-561:


Attachment: SOLR-561.patch

Patch with minor fixes related to the admin page.

> Solr replication by Solr (for windows also)
> ---
>
> Key: SOLR-561
> URL: https://issues.apache.org/jira/browse/SOLR-561
> Project: Solr
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 1.4
> Environment: All
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: deletion_policy.patch, SOLR-561-core.patch, 
> SOLR-561-full.patch, SOLR-561-full.patch, SOLR-561-full.patch, 
> SOLR-561-full.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
> SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
> SOLR-561.patch, SOLR-561.patch
>
>
> The current replication strategy in solr involves shell scripts . The 
> following are the drawbacks with the approach
> *  It does not work with windows
> * Replication works as a separate piece not integrated with solr.
> * Cannot control replication from solr admin/JMX
> * Each operation requires manual telnet to the host
> Doing the replication in java has the following advantages
> * Platform independence
> * Manual steps can be completely eliminated. Everything can be driven from 
> solrconfig.xml .
> ** Adding the url of the master in the slaves should be good enough to enable 
> replication. Other things like frequency of
> snapshoot/snappull can also be configured . All other information can be 
> automatically obtained.
> * Start/stop can be triggered from solr/admin or JMX
> * Can get the status/progress while replication is going on. It can also 
> abort an ongoing replication
> * No need to have a login into the machine 
> * From a development perspective, we can unit test it
> This issue can track the implementation of solr replication in java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-561) Solr replication by Solr (for windows also)

2008-10-20 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-561:
---

Attachment: SOLR-561.patch

Another iteration over Akshay's patch.

# Made the collections used for keeping statistics synchronized to avoid 
concurrent modification exceptions.
# Removed @author tags and put @version and @since 1.4 tags

> Solr replication by Solr (for windows also)
> ---
>
> Key: SOLR-561
> URL: https://issues.apache.org/jira/browse/SOLR-561
> Project: Solr
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 1.4
> Environment: All
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: deletion_policy.patch, SOLR-561-core.patch, 
> SOLR-561-full.patch, SOLR-561-full.patch, SOLR-561-full.patch, 
> SOLR-561-full.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
> SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
> SOLR-561.patch, SOLR-561.patch, SOLR-561.patch
>
>
> The current replication strategy in solr involves shell scripts . The 
> following are the drawbacks with the approach
> *  It does not work with windows
> * Replication works as a separate piece not integrated with solr.
> * Cannot control replication from solr admin/JMX
> * Each operation requires manual telnet to the host
> Doing the replication in java has the following advantages
> * Platform independence
> * Manual steps can be completely eliminated. Everything can be driven from 
> solrconfig.xml .
> ** Adding the url of the master in the slaves should be good enough to enable 
> replication. Other things like frequency of
> snapshoot/snappull can also be configured . All other information can be 
> automatically obtained.
> * Start/stop can be triggered from solr/admin or JMX
> * Can get the status/progress while replication is going on. It can also 
> abort an ongoing replication
> * No need to have a login into the machine 
> * From a development perspective, we can unit test it
> This issue can track the implementation of solr replication in java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



immediatley commit of docs doesn't work in multiCore case

2008-10-20 Thread Parisa

Hi,

I want to see the doc that is committed as soon as I commit it in my search
result so

I did as you suggest :


solrUrl = "http://mySolrServer:8080/solr/core1/";;

CommonsHttpSolrServer server = new CommonsHttpSolrServer(solrUrl);
 server.setParser(new XMLResponseParser());

 UpdateRequest req = new UpdateRequest();
 req.setAction(UpdateRequest.ACTION.COMMIT, false,false);
 req.add(solrDoc);
 UpdateResponse rsp = req.process(server);


a you see I use multicore config and every thing is ok but when I commit it
I can't see this doc in search result unless I restart the solr server.

I have also tested it with this code:
  server.add(solrDoc);
 server.commit(false,false);

I track the commit method in DirectUpdateHandler2 class  and it is called
and works correctly . 


Regards,

Parisa


 


 


-- 
View this message in context: 
http://www.nabble.com/immediatley-commit-of-docs-doesn%27t-work-in-multiCore-case-tp20069593p20069593.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: immediatley commit of docs doesn't work in multiCore case

2008-10-20 Thread Parisa

I should mention that I use apache-solr-1.3.0


Parisa wrote:
> 
> Hi,
> 
> I want to see the doc that is committed as soon as I commit it in my
> search result so
> 
> I did as you suggest :
> 
> 
> solrUrl = "http://mySolrServer:8080/solr/core1/";;
> 
> CommonsHttpSolrServer server = new CommonsHttpSolrServer(solrUrl);
>  server.setParser(new XMLResponseParser());
> 
>  UpdateRequest req = new UpdateRequest();
>  req.setAction(UpdateRequest.ACTION.COMMIT, false,false);
>  req.add(solrDoc);
>  UpdateResponse rsp = req.process(server);
> 
> 
> a you see I use multicore config and every thing is ok but when I commit
> it I can't see this doc in search result unless I restart the solr server.
> 
> I have also tested it with this code:
>   server.add(solrDoc);
>  server.commit(false,false);
> 
> I track the commit method in DirectUpdateHandler2 class  and it is called
> and works correctly . 
> 
> 
> Regards,
> 
> Parisa
> 
> 
>  
> 
> 
>  
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/immediatley-commit-of-docs-doesn%27t-work-in-multiCore-case-tp20069593p20070128.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Updated: (SOLR-818) NullPointerException at org.apache.solr.client.solrj.SolrQuery.setFields(SolrQuery.java:361)

2008-10-20 Thread Gunnar Wagenknecht (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunnar Wagenknecht updated SOLR-818:


Attachment: solr_818_npe-solrquery.patch

patch to fix the problem

> NullPointerException at 
> org.apache.solr.client.solrj.SolrQuery.setFields(SolrQuery.java:361)
> 
>
> Key: SOLR-818
> URL: https://issues.apache.org/jira/browse/SOLR-818
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: Gunnar Wagenknecht
> Attachments: solr_818_npe-solrquery.patch
>
>
> {noformat}
> java.lang.NullPointerException
>   at org.apache.solr.client.solrj.SolrQuery.setFields(SolrQuery.java:361)
> {noformat}
> This happens when calling {{SolrQuery#setFields(null);}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-818) NullPointerException at org.apache.solr.client.solrj.SolrQuery.setFields(SolrQuery.java:361)

2008-10-20 Thread Gunnar Wagenknecht (JIRA)
NullPointerException at 
org.apache.solr.client.solrj.SolrQuery.setFields(SolrQuery.java:361)


 Key: SOLR-818
 URL: https://issues.apache.org/jira/browse/SOLR-818
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.3
Reporter: Gunnar Wagenknecht
 Attachments: solr_818_npe-solrquery.patch

{noformat}
java.lang.NullPointerException
at org.apache.solr.client.solrj.SolrQuery.setFields(SolrQuery.java:361)
{noformat}

This happens when calling {{SolrQuery#setFields(null);}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-813) Add new DoubleMetaphone Filter and Factory

2008-10-20 Thread Todd Feak (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641032#action_12641032
 ] 

Todd Feak commented on SOLR-813:


Good catch on that bug and enhancements. I put them in my current 
implementation. Thank you.

> Add new DoubleMetaphone Filter and Factory
> --
>
> Key: SOLR-813
> URL: https://issues.apache.org/jira/browse/SOLR-813
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Todd Feak
>Assignee: Ryan McKinley
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-813.patch, SOLR-813.patch, SOLR-813.patch
>
>
> The existing PhoneticFilter allows for use of the DoubleMetaphone encoder. 
> However, it doesn't expose the maxCodeLength() setting, and it ignores the 
> alternate encodings that the encoder provides for some words. This new filter 
> is not as generic as the PhoneticFilter, but allows more detailed control 
> over the DoubleMetaphone encoder.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: immediatley commit of docs doesn't work in multiCore case

2008-10-20 Thread Ryan McKinley

Do you have the XmlUpdateRequestHandler mapped to /update?

if that fixes it, we should make a bigger note on:
http://wiki.apache.org/solr/Solrj

ryan



On Oct 20, 2008, at 10:41 AM, Parisa wrote:



Hi,

I want to see the doc that is committed as soon as I commit it in my  
search

result so

I did as you suggest :


solrUrl = "http://mySolrServer:8080/solr/core1/";;

CommonsHttpSolrServer server = new CommonsHttpSolrServer(solrUrl);
server.setParser(new XMLResponseParser());

UpdateRequest req = new UpdateRequest();
req.setAction(UpdateRequest.ACTION.COMMIT, false,false);
req.add(solrDoc);
UpdateResponse rsp = req.process(server);


a you see I use multicore config and every thing is ok but when I  
commit it
I can't see this doc in search result unless I restart the solr  
server.


I have also tested it with this code:
 server.add(solrDoc);
server.commit(false,false);

I track the commit method in DirectUpdateHandler2 class  and it is  
called

and works correctly .


Regards,

Parisa








--
View this message in context: 
http://www.nabble.com/immediatley-commit-of-docs-doesn%27t-work-in-multiCore-case-tp20069593p20069593.html
Sent from the Solr - Dev mailing list archive at Nabble.com.





[jira] Commented: (SOLR-815) Add new Japanese half-width/full-width normalizaton Filter and Factory

2008-10-20 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641071#action_12641071
 ] 

Walter Underwood commented on SOLR-815:
---

I looked it up, and even found a reason to do it the right way.

Latin should be normalized to halfwidth (in the Latin-1 character space).

Kana should be normalized to fullwidth.

Normalizing Latin characters to fullwidth would mean you could not use the 
existing accent-stripping filters or probably any other filter that expected 
Latin-1, like synonyms. Normalizing to halfwidth makes the rest of Solr and 
Lucene work as expected.

See section 12.5: http://www.unicode.org/versions/Unicode5.0.0/ch12.pdf

The compatability forms (the ones we normalize away from) are int the Unicode 
range U+FF00 to U+FFEF.
The correct mappings from those forms are in this doc: 
http://www.unicode.org/charts/PDF/UFF00.pdf

Other charts are here: http://www.unicode.org/charts/


> Add new Japanese half-width/full-width normalizaton Filter and Factory
> --
>
> Key: SOLR-815
> URL: https://issues.apache.org/jira/browse/SOLR-815
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Todd Feak
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-815.patch
>
>
> Japanese Katakana and  Latin alphabet characters exist as both a "half-width" 
> and "full-width" version. This new Filter normalizes to the full-width 
> version to allow searching and indexing using both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-815) Add new Japanese half-width/full-width normalizaton Filter and Factory

2008-10-20 Thread Todd Feak (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Feak updated SOLR-815:
---

Attachment: SOLR-815.patch

That's a good reason to switch the Latin mappings. I reversed them and updated 
the Javadoc and tests as well. New patch attached.

> Add new Japanese half-width/full-width normalizaton Filter and Factory
> --
>
> Key: SOLR-815
> URL: https://issues.apache.org/jira/browse/SOLR-815
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Todd Feak
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-815.patch, SOLR-815.patch
>
>
> Japanese Katakana and  Latin alphabet characters exist as both a "half-width" 
> and "full-width" version. This new Filter normalizes to the full-width 
> version to allow searching and indexing using both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-815) Add new Japanese half-width/full-width normalizaton Filter and Factory

2008-10-20 Thread Todd Feak (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641100#action_12641100
 ] 

Todd Feak commented on SOLR-815:


For our purposes, the Japanese and Latin characters were all we were interested 
in, and that's what has been contributed.

For future growth, the entire set of characters that have mappings in this 
space could be included.  At that point, maybe the Filter and Factory name 
would change to be non-Japanese specific.

> Add new Japanese half-width/full-width normalizaton Filter and Factory
> --
>
> Key: SOLR-815
> URL: https://issues.apache.org/jira/browse/SOLR-815
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Todd Feak
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-815.patch, SOLR-815.patch
>
>
> Japanese Katakana and  Latin alphabet characters exist as both a "half-width" 
> and "full-width" version. This new Filter normalizes to the full-width 
> version to allow searching and indexing using both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-819) Add Arabic Support

2008-10-20 Thread Grant Ingersoll (JIRA)
Add Arabic Support
--

 Key: SOLR-819
 URL: https://issues.apache.org/jira/browse/SOLR-819
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Grant Ingersoll
Priority: Minor


https://issues.apache.org/jira/browse/LUCENE-1406 adds Arabic support to 
Lucene.  Let's generate token filters for Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-769) Support Document and Search Result clustering

2008-10-20 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-769:
-

Attachment: SOLR-769.patch

Removed the alternate algorithm implementations, but left in some of the 
framework for adding them.  The Carrot2 maintainers are likely to remove Fuzzy 
Ants and some of the other implementations in 3.0, which is due out sometime 
soon.  Thus, I'd rather not support something that isn't recommended.

I'm likely to commit this fairly soon.

-Grant

> Support Document and Search Result clustering
> -
>
> Key: SOLR-769
> URL: https://issues.apache.org/jira/browse/SOLR-769
> Project: Solr
>  Issue Type: New Feature
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: clustering-libs.tar, clustering-libs.tar, 
> SOLR-769.patch, SOLR-769.patch, SOLR-769.patch, SOLR-769.patch
>
>
> Clustering is a useful tool for working with documents and search results, 
> similar to the notion of dynamic faceting.  Carrot2 
> (http://project.carrot2.org/) is a nice, BSD-licensed, library for doing 
> search results clustering.  Mahout (http://lucene.apache.org/mahout) is well 
> suited for whole-corpus clustering.  
> The patch I lays out a contrib module that starts off w/ an integration of a 
> SearchComponent for doing clustering and an implementation using Carrot.  In 
> search results mode, it will use the DocList as the input for the cluster.   
> While Carrot2 comes w/ a Solr input component, it is not the same as the 
> SearchComponent that I have in that the Carrot example actually submits a 
> query to Solr, whereas my SearchComponent is just chained into the Component 
> list and uses the ResponseBuilder to add in the cluster results.
> While not fully fleshed out yet, the collection based mode will take in a 
> list of ids or just use the whole collection and will produce clusters.  
> Since this is a longer, typically offline task, there will need to be some 
> type of storage mechanism (and replication??) for the clusters.  I _may_ 
> push this off to a separate JIRA issue, but I at least want to present the 
> use case as part of the design of this component/contrib.  It may even make 
> sense that we split this out, such that the building piece is something like 
> an UpdateProcessor and then the SearchComponent just acts as a lookup 
> mechanism.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



dynamic filtering

2008-10-20 Thread xibin

Hi -

I have done some searching and haven't found what I was looking for. I hope
this has not been discussed in the forum already.

This is a question as well as looking for design ideas.

I need the ability to dynamically filter out search results as they are
being collected. The logic that I am developing cannot be statically applied
at indexing time, so the data is NOT available in the indexed form. It can
be derived/calculated using one or more of the indexed fields, and it's
different for each query. The purpose of this "derived field" is to
eliminate resulting indexes so only a subset is considered. This is very
similar to the Filter concept already in the SolrIndexSearcher.QueryCommand.
The difference is that I can't write a Lucene Query to obtain the subset of
indices, I need to implement an algorithm involving using some of the fields
in the documents.

What I had in mind is the concept of a DyamicFilter. A DynamicFilter can be
used in the HitCollectors (in SolrIndexSearcher) to perform dynamic
filtering as results are being collected. DynamicFilters would be added into
the SolrSearchIndexer.QueryCommand class so they can be called during
collecting time. I considered writing a SearchComponent or a RequestHandler,
and they seem to be a little bit off for the timing that I needed.

The parameters to construct my DynamicFilters are passed in from http query
params. I could pick them up and create the DynamicFilters in the
QueryComponent as it creates a QueryCommand. The QueryCommand will then use
it during Hit Collection in SolrIndexSearcher.

I hope this captures the detail of what I am trying to do. I am looking for
validation/alternative suggestions from an insider (yonik?). I feel bad
having to do the intrusive modification here, and am open for suggestions. I
would be interested in contributing this work if it turns out to be
valuable.

Thanks for reading.

Xibin
-- 
View this message in context: 
http://www.nabble.com/dynamic-filtering-tp20079841p20079841.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: dynamic filtering

2008-10-20 Thread Ryan McKinley

I don't quite follow what you are trying to do
can you give a concrete example?

What is missing from the Filter side?

if you need to modify the lucene Query, a custom SearchComponent is  
the way to go...



On Oct 20, 2008, at 6:13 PM, xibin wrote:



Hi -

I have done some searching and haven't found what I was looking for.  
I hope

this has not been discussed in the forum already.

This is a question as well as looking for design ideas.

I need the ability to dynamically filter out search results as they  
are
being collected. The logic that I am developing cannot be statically  
applied
at indexing time, so the data is NOT available in the indexed form.  
It can
be derived/calculated using one or more of the indexed fields, and  
it's

different for each query. The purpose of this "derived field" is to
eliminate resulting indexes so only a subset is considered. This is  
very
similar to the Filter concept already in the  
SolrIndexSearcher.QueryCommand.
The difference is that I can't write a Lucene Query to obtain the  
subset of
indices, I need to implement an algorithm involving using some of  
the fields

in the documents.

What I had in mind is the concept of a DyamicFilter. A DynamicFilter  
can be

used in the HitCollectors (in SolrIndexSearcher) to perform dynamic
filtering as results are being collected. DynamicFilters would be  
added into

the SolrSearchIndexer.QueryCommand class so they can be called during
collecting time. I considered writing a SearchComponent or a  
RequestHandler,

and they seem to be a little bit off for the timing that I needed.

The parameters to construct my DynamicFilters are passed in from  
http query

params. I could pick them up and create the DynamicFilters in the
QueryComponent as it creates a QueryCommand. The QueryCommand will  
then use

it during Hit Collection in SolrIndexSearcher.

I hope this captures the detail of what I am trying to do. I am  
looking for
validation/alternative suggestions from an insider (yonik?). I feel  
bad
having to do the intrusive modification here, and am open for  
suggestions. I

would be interested in contributing this work if it turns out to be
valuable.

Thanks for reading.

Xibin
--
View this message in context: 
http://www.nabble.com/dynamic-filtering-tp20079841p20079841.html
Sent from the Solr - Dev mailing list archive at Nabble.com.





Re: dynamic filtering

2008-10-20 Thread xibin

Thanks for the reply.

Let's say that I have a location code field in the document. Say there are
2000 indexed documents that have various location codes. When a user
searches, she specifies her own location, as well as the maximum radius that
she wants the distance to be, in addition to other search fields that she
may specify. 

In order to satisfy this scenario, I am thinking about performing the search
using all the other fields normally, but use a distance to dynamically
filter out unwanted results. This way the distance calculation performs
filtering, and the relevancy from other search fields does not change...

Let me know if this makes sense to you.

Thanks
Xibin


ryantxu wrote:
> 
> I don't quite follow what you are trying to do
> can you give a concrete example?
> 
> What is missing from the Filter side?
> 
> if you need to modify the lucene Query, a custom SearchComponent is  
> the way to go...
> 
> 
> On Oct 20, 2008, at 6:13 PM, xibin wrote:
> 
>>
>> Hi -
>>
>> I have done some searching and haven't found what I was looking for.  
>> I hope
>> this has not been discussed in the forum already.
>>
>> This is a question as well as looking for design ideas.
>>
>> I need the ability to dynamically filter out search results as they  
>> are
>> being collected. The logic that I am developing cannot be statically  
>> applied
>> at indexing time, so the data is NOT available in the indexed form.  
>> It can
>> be derived/calculated using one or more of the indexed fields, and  
>> it's
>> different for each query. The purpose of this "derived field" is to
>> eliminate resulting indexes so only a subset is considered. This is  
>> very
>> similar to the Filter concept already in the  
>> SolrIndexSearcher.QueryCommand.
>> The difference is that I can't write a Lucene Query to obtain the  
>> subset of
>> indices, I need to implement an algorithm involving using some of  
>> the fields
>> in the documents.
>>
>> What I had in mind is the concept of a DyamicFilter. A DynamicFilter  
>> can be
>> used in the HitCollectors (in SolrIndexSearcher) to perform dynamic
>> filtering as results are being collected. DynamicFilters would be  
>> added into
>> the SolrSearchIndexer.QueryCommand class so they can be called during
>> collecting time. I considered writing a SearchComponent or a  
>> RequestHandler,
>> and they seem to be a little bit off for the timing that I needed.
>>
>> The parameters to construct my DynamicFilters are passed in from  
>> http query
>> params. I could pick them up and create the DynamicFilters in the
>> QueryComponent as it creates a QueryCommand. The QueryCommand will  
>> then use
>> it during Hit Collection in SolrIndexSearcher.
>>
>> I hope this captures the detail of what I am trying to do. I am  
>> looking for
>> validation/alternative suggestions from an insider (yonik?). I feel  
>> bad
>> having to do the intrusive modification here, and am open for  
>> suggestions. I
>> would be interested in contributing this work if it turns out to be
>> valuable.
>>
>> Thanks for reading.
>>
>> Xibin
>> -- 
>> View this message in context:
>> http://www.nabble.com/dynamic-filtering-tp20079841p20079841.html
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/dynamic-filtering-tp20079841p20080481.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: dynamic filtering

2008-10-20 Thread Ryan McKinley

check:
https://issues.apache.org/jira/browse/LUCENE-1387

On Oct 20, 2008, at 6:56 PM, xibin wrote:



Thanks for the reply.

Let's say that I have a location code field in the document. Say  
there are

2000 indexed documents that have various location codes. When a user
searches, she specifies her own location, as well as the maximum  
radius that
she wants the distance to be, in addition to other search fields  
that she

may specify.

In order to satisfy this scenario, I am thinking about performing  
the search

using all the other fields normally, but use a distance to dynamically
filter out unwanted results. This way the distance calculation  
performs
filtering, and the relevancy from other search fields does not  
change...


Let me know if this makes sense to you.

Thanks
Xibin


ryantxu wrote:


I don't quite follow what you are trying to do
can you give a concrete example?

What is missing from the Filter side?

if you need to modify the lucene Query, a custom SearchComponent is
the way to go...


On Oct 20, 2008, at 6:13 PM, xibin wrote:



Hi -

I have done some searching and haven't found what I was looking for.
I hope
this has not been discussed in the forum already.

This is a question as well as looking for design ideas.

I need the ability to dynamically filter out search results as they
are
being collected. The logic that I am developing cannot be statically
applied
at indexing time, so the data is NOT available in the indexed form.
It can
be derived/calculated using one or more of the indexed fields, and
it's
different for each query. The purpose of this "derived field" is to
eliminate resulting indexes so only a subset is considered. This is
very
similar to the Filter concept already in the
SolrIndexSearcher.QueryCommand.
The difference is that I can't write a Lucene Query to obtain the
subset of
indices, I need to implement an algorithm involving using some of
the fields
in the documents.

What I had in mind is the concept of a DyamicFilter. A DynamicFilter
can be
used in the HitCollectors (in SolrIndexSearcher) to perform dynamic
filtering as results are being collected. DynamicFilters would be
added into
the SolrSearchIndexer.QueryCommand class so they can be called  
during

collecting time. I considered writing a SearchComponent or a
RequestHandler,
and they seem to be a little bit off for the timing that I needed.

The parameters to construct my DynamicFilters are passed in from
http query
params. I could pick them up and create the DynamicFilters in the
QueryComponent as it creates a QueryCommand. The QueryCommand will
then use
it during Hit Collection in SolrIndexSearcher.

I hope this captures the detail of what I am trying to do. I am
looking for
validation/alternative suggestions from an insider (yonik?). I feel
bad
having to do the intrusive modification here, and am open for
suggestions. I
would be interested in contributing this work if it turns out to be
valuable.

Thanks for reading.

Xibin
--
View this message in context:
http://www.nabble.com/dynamic-filtering-tp20079841p20079841.html
Sent from the Solr - Dev mailing list archive at Nabble.com.







--
View this message in context: 
http://www.nabble.com/dynamic-filtering-tp20079841p20080481.html
Sent from the Solr - Dev mailing list archive at Nabble.com.





Must QueryComponent always be on and other Design Questions

2008-10-20 Thread Grant Ingersoll
I've run into this a couple of times now and I feel like it warrants a  
discussion


For both the SpellCheckComponent (SCC) and now for the new  
ClusteringComponent (SOLR-769) I think there are cases where the  
QueryComponent (QC) is not required.  In the SpellCheckComponent case  
it is when building the spelling index.  In the ClusteringComponent,  
it is possible to ask for document clusters without running any query  
(it also will be possible to get clusters _with_ a query as well, and  
it also is distinguished from the handling of search results  
clustering, too).  Thus, it seems really weird to have to pass in a  
dummy query, yet that is what one has to do in order to avoid getting  
an NPE in the QC.


Now, I suppose these pieces could be modeled as something else or it's  
possible to split the two functionalities into separate things (1  
ReqHandler, 1 SearchComp).  In fact, the said functionality is not  
really "search" functionality, or SearchComponent functionality, yet  
much of the rest of the functionality in the code in question is  
"search" functionality and logically belongs as a SearchComponent.  In  
the case of the SCC build, it's akin to an indexing operation.  In the  
clustering case, it's a query, albeit a non-traditional one.  In some  
sense, this kind of document clustering is like non-query based  
faceting which leads to more navigation/browsing instead of searching.


The quick fix is to just put in null checks into the QC or pass in a  
dummy query with rows=0, but I'm not sure if there isn't a slightly  
bigger picture here that needs adjusting in terms of  
SearchComponents.  Namely, must the QC always be on?  And, should we  
think a little more about components that don't require a query in  
order to function and how they play in the scheme of things?


Thoughts?  Recommendations?

-Grant


Re: Must QueryComponent always be on and other Design Questions

2008-10-20 Thread Grant Ingersoll

For completeness, here's the NPE:
SEVERE: java.lang.NullPointerException
at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:37)
	at  
org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java: 
104)

at org.apache.solr.search.QParser.getQuery(QParser.java:88)
	at  
org 
.apache 
.solr.handler.component.QueryComponent.prepare(QueryComponent.java:82)
	at  
org 
.apache 
.solr 
.handler.component.SearchHandler.handleRequestBody(SearchHandler.java: 
149)
	at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)
	at  
org 
.apache 
.solr 
.handler 
.clustering 
.ClusteringComponentTest.testComponent(ClusteringComponentTest.java:70)


Don't worry about the ClusteringComponentTest yet, I haven't posted  
that code yet.


On Oct 20, 2008, at 7:56 PM, Grant Ingersoll wrote:

I've run into this a couple of times now and I feel like it warrants  
a discussion


For both the SpellCheckComponent (SCC) and now for the new  
ClusteringComponent (SOLR-769) I think there are cases where the  
QueryComponent (QC) is not required.  In the SpellCheckComponent  
case it is when building the spelling index.  In the  
ClusteringComponent, it is possible to ask for document clusters  
without running any query (it also will be possible to get clusters  
_with_ a query as well, and it also is distinguished from the  
handling of search results clustering, too).  Thus, it seems really  
weird to have to pass in a dummy query, yet that is what one has to  
do in order to avoid getting an NPE in the QC.


Now, I suppose these pieces could be modeled as something else or  
it's possible to split the two functionalities into separate things  
(1 ReqHandler, 1 SearchComp).  In fact, the said functionality is  
not really "search" functionality, or SearchComponent functionality,  
yet much of the rest of the functionality in the code in question is  
"search" functionality and logically belongs as a SearchComponent.   
In the case of the SCC build, it's akin to an indexing operation.   
In the clustering case, it's a query, albeit a non-traditional one.   
In some sense, this kind of document clustering is like non-query  
based faceting which leads to more navigation/browsing instead of  
searching.


The quick fix is to just put in null checks into the QC or pass in a  
dummy query with rows=0, but I'm not sure if there isn't a slightly  
bigger picture here that needs adjusting in terms of  
SearchComponents.  Namely, must the QC always be on?  And, should we  
think a little more about components that don't require a query in  
order to function and how they play in the scheme of things?


Thoughts?  Recommendations?

-Grant





Re: dynamic filtering

2008-10-20 Thread xibin

Thanks for the information. Looks like it is still being integrated. Do you
know what state it is in? I am going to take a look at the contrib and see
where it is at.

Thanks for your help,
Xibin


ryantxu wrote:
> 
> check:
> https://issues.apache.org/jira/browse/LUCENE-1387
> 
> On Oct 20, 2008, at 6:56 PM, xibin wrote:
> 
>>
>> Thanks for the reply.
>>
>> Let's say that I have a location code field in the document. Say  
>> there are
>> 2000 indexed documents that have various location codes. When a user
>> searches, she specifies her own location, as well as the maximum  
>> radius that
>> she wants the distance to be, in addition to other search fields  
>> that she
>> may specify.
>>
>> In order to satisfy this scenario, I am thinking about performing  
>> the search
>> using all the other fields normally, but use a distance to dynamically
>> filter out unwanted results. This way the distance calculation  
>> performs
>> filtering, and the relevancy from other search fields does not  
>> change...
>>
>> Let me know if this makes sense to you.
>>
>> Thanks
>> Xibin
>>
>>
>> ryantxu wrote:
>>>
>>> I don't quite follow what you are trying to do
>>> can you give a concrete example?
>>>
>>> What is missing from the Filter side?
>>>
>>> if you need to modify the lucene Query, a custom SearchComponent is
>>> the way to go...
>>>
>>>
>>> On Oct 20, 2008, at 6:13 PM, xibin wrote:
>>>

 Hi -

 I have done some searching and haven't found what I was looking for.
 I hope
 this has not been discussed in the forum already.

 This is a question as well as looking for design ideas.

 I need the ability to dynamically filter out search results as they
 are
 being collected. The logic that I am developing cannot be statically
 applied
 at indexing time, so the data is NOT available in the indexed form.
 It can
 be derived/calculated using one or more of the indexed fields, and
 it's
 different for each query. The purpose of this "derived field" is to
 eliminate resulting indexes so only a subset is considered. This is
 very
 similar to the Filter concept already in the
 SolrIndexSearcher.QueryCommand.
 The difference is that I can't write a Lucene Query to obtain the
 subset of
 indices, I need to implement an algorithm involving using some of
 the fields
 in the documents.

 What I had in mind is the concept of a DyamicFilter. A DynamicFilter
 can be
 used in the HitCollectors (in SolrIndexSearcher) to perform dynamic
 filtering as results are being collected. DynamicFilters would be
 added into
 the SolrSearchIndexer.QueryCommand class so they can be called  
 during
 collecting time. I considered writing a SearchComponent or a
 RequestHandler,
 and they seem to be a little bit off for the timing that I needed.

 The parameters to construct my DynamicFilters are passed in from
 http query
 params. I could pick them up and create the DynamicFilters in the
 QueryComponent as it creates a QueryCommand. The QueryCommand will
 then use
 it during Hit Collection in SolrIndexSearcher.

 I hope this captures the detail of what I am trying to do. I am
 looking for
 validation/alternative suggestions from an insider (yonik?). I feel
 bad
 having to do the intrusive modification here, and am open for
 suggestions. I
 would be interested in contributing this work if it turns out to be
 valuable.

 Thanks for reading.

 Xibin
 -- 
 View this message in context:
 http://www.nabble.com/dynamic-filtering-tp20079841p20079841.html
 Sent from the Solr - Dev mailing list archive at Nabble.com.

>>>
>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/dynamic-filtering-tp20079841p20080481.html
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/dynamic-filtering-tp20079841p20082651.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



Re: Must QueryComponent always be on and other Design Questions

2008-10-20 Thread Otis Gospodnetic
This is related to something I must have only day dreamed (dreamt?) about, but 
not actually mentioned on solr-dev.
My feeling is we are moving Solr in a direction of a more general web service 
that can host various NLP and ML components, and no longer only do IR/Lucene.  
We see that with a few patches that Grant is cooking, I think we'll see that in 
the Solr+Mahout marriage down the road, and so on.

Is it time to start thinking about Solr sa a server for IR and ML and NLP tasks 
and see how the tightly coupled Lucene can be made morepluggable?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Grant Ingersoll <[EMAIL PROTECTED]>
> To: solr-dev@lucene.apache.org
> Sent: Monday, October 20, 2008 7:56:32 PM
> Subject: Must QueryComponent always be on and other Design Questions
> 
> I've run into this a couple of times now and I feel like it warrants a  
> discussion
> 
> For both the SpellCheckComponent (SCC) and now for the new  
> ClusteringComponent (SOLR-769) I think there are cases where the  
> QueryComponent (QC) is not required.  In the SpellCheckComponent case  
> it is when building the spelling index.  In the ClusteringComponent,  
> it is possible to ask for document clusters without running any query  
> (it also will be possible to get clusters _with_ a query as well, and  
> it also is distinguished from the handling of search results  
> clustering, too).  Thus, it seems really weird to have to pass in a  
> dummy query, yet that is what one has to do in order to avoid getting  
> an NPE in the QC.
> 
> Now, I suppose these pieces could be modeled as something else or it's  
> possible to split the two functionalities into separate things (1  
> ReqHandler, 1 SearchComp).  In fact, the said functionality is not  
> really "search" functionality, or SearchComponent functionality, yet  
> much of the rest of the functionality in the code in question is  
> "search" functionality and logically belongs as a SearchComponent.  In  
> the case of the SCC build, it's akin to an indexing operation.  In the  
> clustering case, it's a query, albeit a non-traditional one.  In some  
> sense, this kind of document clustering is like non-query based  
> faceting which leads to more navigation/browsing instead of searching.
> 
> The quick fix is to just put in null checks into the QC or pass in a  
> dummy query with rows=0, but I'm not sure if there isn't a slightly  
> bigger picture here that needs adjusting in terms of  
> SearchComponents.  Namely, must the QC always be on?  And, should we  
> think a little more about components that don't require a query in  
> order to function and how they play in the scheme of things?
> 
> Thoughts?  Recommendations?
> 
> -Grant



Re: immediatley commit of docs doesn't work in multiCore case

2008-10-20 Thread Parisa

yes I have this tag in my SolrConfig.xml of all cores.





Parisa 



ryantxu wrote:
> 
> Do you have the XmlUpdateRequestHandler mapped to /update?
> 
> if that fixes it, we should make a bigger note on:
> http://wiki.apache.org/solr/Solrj
> 
> ryan
> 
> 
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/immediatley-commit-of-docs-doesn%27t-work-in-multiCore-case-tp20069593p20084198.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Updated: (SOLR-561) Solr replication by Solr (for windows also)

2008-10-20 Thread Akshay K. Ukey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshay K. Ukey updated SOLR-561:


Attachment: SOLR-561.patch

Again a minor fix in replication admin page

> Solr replication by Solr (for windows also)
> ---
>
> Key: SOLR-561
> URL: https://issues.apache.org/jira/browse/SOLR-561
> Project: Solr
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 1.4
> Environment: All
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: deletion_policy.patch, SOLR-561-core.patch, 
> SOLR-561-full.patch, SOLR-561-full.patch, SOLR-561-full.patch, 
> SOLR-561-full.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
> SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
> SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch
>
>
> The current replication strategy in solr involves shell scripts . The 
> following are the drawbacks with the approach
> *  It does not work with windows
> * Replication works as a separate piece not integrated with solr.
> * Cannot control replication from solr admin/JMX
> * Each operation requires manual telnet to the host
> Doing the replication in java has the following advantages
> * Platform independence
> * Manual steps can be completely eliminated. Everything can be driven from 
> solrconfig.xml .
> ** Adding the url of the master in the slaves should be good enough to enable 
> replication. Other things like frequency of
> snapshoot/snappull can also be configured . All other information can be 
> automatically obtained.
> * Start/stop can be triggered from solr/admin or JMX
> * Can get the status/progress while replication is going on. It can also 
> abort an ongoing replication
> * No need to have a login into the machine 
> * From a development perspective, we can unit test it
> This issue can track the implementation of solr replication in java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.