[jira] Commented: (SOLR-1163) Solr Explorer - A generic GWT client for Solr

2009-08-21 Thread Uri Boness (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746395#action_12746395
 ] 

Uri Boness commented on SOLR-1163:
--

bq. Does a GWT client application have a clean license?
If having a pure Apache 2 license is considered to be clean, then yes.

bq. Are there any other GWT apps in the Apache project? 
No as far as I know. But you do have 
[LucidGaze|http://www.lucidimagination.com/Downloads/Certified-Distributions#lucidgaze]
 which is a Solr monitoring tool and I think it's also a GWT application.

bq. +1. This is great.
Thanks, you can also vote for it ;-)

bq. The Simile project has some nice data explorer UIs. The Simile-Widget 
gallery displays them.
Thanks for the suggestion. I know this project, but from my experience some of 
their widgets don't perform really well. Personally, when it comes to data 
visualization I think flash is the best technology we have at the moment and 
it's quite easy to interact with it via Javascript and GWT (that's how Google 
does for most of their applications/services: analytics, finances, etc..)

> Solr Explorer - A generic GWT client for Solr
> -
>
> Key: SOLR-1163
> URL: https://issues.apache.org/jira/browse/SOLR-1163
> Project: Solr
>  Issue Type: New Feature
>  Components: web gui
>Affects Versions: 1.3
>Reporter: Uri Boness
> Attachments: graphics.zip, solr-explorer.patch, solr-explorer.patch
>
>
> The attached patch is a GWT generic client for solr. It is currently 
> standalone, meaning that once built, one can open the generated HTML file in 
> a browser and communicate with any deployed solr. It is configured with it's 
> own configuration file, where one can configure the solr instance/core to 
> connect to. Since it's currently standalone and completely client side based, 
> it uses JSON with padding (cross-side scripting) to connect to remote solr 
> servers. Some of the supported features:
> - Simple query search
> - Sorting - one can dynamically define new sort criterias
> - Search results are rendered very much like Google search results are 
> rendered. It is also possible to view all stored field values for every hit. 
> - Custom hit rendering - It is possible to show thumbnails (images) per hit 
> and also customize a view for a hit based on html templates
> - Faceting - one can dynamically define field and query facets via the UI. it 
> is also possible to pre-configure these facets in the configuration file.
> - Highlighting - you can dynamically configure highlighting. it can also be 
> pre-configured in the configuration file
> - Spellchecking - you can dynamically configure spell checking. Can also be 
> done in the configuration file. Supports collation. It is also possible to 
> send "build" and "reload" commands.
> - Data import handler - if used, it is possible to send a "full-import" and 
> "status" command ("delta-import" is not implemented yet, but it's easy to add)
> - Console - For development time, there's a small console which can help to 
> better understand what's going on behind the scenes. One can use it to:
> ** view the client logs
> ** browse the solr scheme
> ** View a break down of the current search context
> ** View a break down of the query URL that is sent to solr
> ** View the raw JSON response returning from Solr
> This client is actually a platform that can be greatly extended for more 
> things. The goal is to have a client where the explorer part is just one view 
> of it. Other future views include: Monitoring, Administration, Query Builder, 
> DataImportHandler configuration, and more...
> To get a better view of what's currently possible. We've set up a public 
> version of this client at: http://search.jteam.nl/explorer. This client is 
> configured with one solr instance where crawled YouTube movies where indexed. 
> You can also check out a screencast for this deployed client: 
> http://search.jteam.nl/help
> The patch created a new folder in the contrib. directory. Since the patch 
> doesn't contain binaries, an additional zip file is provides that needs to be 
> extract to add all the required graphics. This module is maven2 based and is 
> configured in such a way that all GWT related tools/libraries are 
> automatically downloaded when the modules is compiled. One of the artifacts 
> of the build is a war file which can be deployed in any servlet container.
> NOTE: this client works best on WebKit based browsers (for performance 
> reason) but also works on firefox and ie 7+. That said, it should be taken 
> into account that it is still under development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1378) Add reference to Packt's Solr book.

2009-08-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746392#action_12746392
 ] 

Yonik Seeley commented on SOLR-1378:


David, I needed to put   tags around the book image in the news section.
"forrest run" (interactive mode) does not detect all the errors that the 
straight "forrest" will.

That said, I can't build the current site myself... not even on a clean 
checkout with your patch not applied (I've only tried forrest 0.8 so far).  
Anyone else?

> Add reference to Packt's Solr book.
> ---
>
> Key: SOLR-1378
> URL: https://issues.apache.org/jira/browse/SOLR-1378
> Project: Solr
>  Issue Type: Task
>Reporter: David Smiley
> Attachments: solr-book-image.jpg, solr_book_packt.patch
>
>
> I've attached news of the Solr update.  It includes an image under the left 
> nav area, and a news item with the same image.  The text is as follows:
> David Smiley and Eric Pugh are proud to introduce the first book on Solr, 
> "Solr 1.4 Enterprise Search Server" from Packt Publishing.
> This book is a comprehensive reference guide for nearly every feature Solr 
> has to offer. It serves the reader right from initiation to development to 
> deployment. It also comes with complete running examples to demonstrate its 
> use and show how to integrate it with other languages and frameworks.
> To keep this interesting and realistic, it uses a large open source set of 
> metadata about artists, releases, and tracks courtesy of the MusicBrainz.org 
> project. Using this data as a testing ground for Solr, you will learn how to 
> import this data in various ways from CSV to XML to database access. You will 
> then learn how to search this data in a myriad of ways, including Solr's rich 
> query syntax, "boosting" match scores based on record data and other means, 
> about searching across multiple fields with different boosts, getting facets 
> on the results, auto-complete user queries, spell-correcting searches, 
> highlighting queried text in search results, and so on.
> After this thorough tour, you'll see working examples of integrating a 
> variety of technologies with Solr such as Java, JavaScript, Drupal, Ruby, 
> PHP, and Python.
> Finally, this book covers various deployment considerations to include 
> indexing strategies and performance-oriented configuration that will enable 
> you to scale Solr to meet the needs of a high-volume site. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1375) BloomFilter on a field

2009-08-21 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1375:
---

Attachment: SOLR-1375.patch

* Bug fixes

* Core name included in response

* Wiki is located at http://wiki.apache.org/solr/BloomIndexComponent

> BloomFilter on a field
> --
>
> Key: SOLR-1375
> URL: https://issues.apache.org/jira/browse/SOLR-1375
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1375.patch, SOLR-1375.patch, SOLR-1375.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> * A bloom filter is a read only probabilistic set. Its useful
> for verifying a key exists in a set, though it returns false
> positives. http://en.wikipedia.org/wiki/Bloom_filter 
> * The use case is indexing in Hadoop and checking for duplicates
> against a Solr cluster (which when using term dictionary or a
> query) is too slow and exceeds the time consumed for indexing.
> When a match is found, the host, segment, and term are returned.
> If the same term is found on multiple servers, multiple results
> are returned by the distributed process. (We'll need to add in
> the core name I just realized). 
> * When new segments are created, and commit is called, a new
> bloom filter is generated from a given field (default:id) by
> iterating over the term dictionary values. There's a bloom
> filter file per segment, which is managed on each Solr shard.
> When segments are merged away, their corresponding .blm files is
> also removed. In a future version we'll have a central server
> for the bloom filters so we're not abusing the thread pool of
> the Solr proxy and the networking of the Solr cluster (this will
> be done sooner than later after testing this version). I held
> off because the central server requires syncing the Solr
> servers' files (which is like reverse replication). 
> * The patch uses the BloomFilter from Hadoop 0.20. I want to jar
> up only the necessary classes so we don't have a giant Hadoop
> jar in lib.
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html
> * Distributed code is added and seems to work, I extended
> TestDistributedSearch to test over multiple HTTP servers. I
> chose this approach rather than the manual method used by (for
> example) TermVectorComponent.testDistributed because I'm new to
> Solr's distributed search and wanted to learn how it works (the
> stages are confusing). Using this method, I didn't need to setup
> multiple tomcat servers and manually execute tests.
> * We need more of the bloom filter options passable via
> solrconfig
> * I'll add more test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: distributed search components

2009-08-21 Thread Mike Anderson

I was working on MLT component with patch SOLR-788.


On Aug 21, 2009, at 6:49 PM, Yonik Seeley wrote:


On Fri, Aug 21, 2009 at 6:35 PM, Mike Anderson wrote:
I've been trying to dissect the MLT component and understand how it  
works.
Every-time I think I have the process figured it out I somehow just  
end up

more confused.


I don't think MTL supports distributed search.

http://wiki.apache.org/solr/DistributedSearch

-Yonik
http://www.lucidimagination.com




[jira] Commented: (SOLR-1369) Add HSQLDB Jar to example-dih

2009-08-21 Thread Eric Pugh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746363#action_12746363
 ] 

Eric Pugh commented on SOLR-1369:
-

I tweaked the docs to point to HSQLDB 1.8.  I'll leave the "unzip hsqldb.zip"  
and "svn add hsqldb/" and "svn ci -m 'expanding example to make getting started 
easier' hsqldb/" to a committer versus attaching a large patch file!

Eric


> Add HSQLDB Jar to example-dih
> -
>
> Key: SOLR-1369
> URL: https://issues.apache.org/jira/browse/SOLR-1369
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Eric Pugh
>
> I went back to show someone the Example-DIH and followed the wiki page 
> directions.  I then ran into an error because the hsqldb uses 1.8, and the 
> hsqldb.jar I downloaded from hsqldb.org was 1.9.  The 1.9 rc shows up above 
> the 1.8 version.
> I see two approaches:  1) Be clearer on the docs, maybe embed a direct link 
> to 
> http://sourceforge.net/projects/hsqldb/files/hsqldb/hsqldb_1_8_0/hsqldb_1_8_0_10.zip/download.
>   
> 2) include hsqldb.jar in the example.  I am assuming the reason this wasn't 
> done was because of licensing issues??   
> Also, any real reason to zip the hsqldb database?  It's under 20k expanded 
> and adds another step.
> Figured I'd get the wisdom of the crowds before changing.
> Eric

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: distributed search components

2009-08-21 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 12:52 PM, Mike Anderson wrote:
> I'm trying to make my way through learning how to modify and write
> distributed search components.

The whole ResponseBuilder stuff is really a first pass - it obviously
could use refinement.  As you go through, it would be great if you
could keep in mind how things could be improved, in addition to how it
currently works.  Don't try to make sense of this as anyone's idea of
"ideal code" but rather "code that currently works".

> A few questions
>
> 1. in SearchHandler, when the query is broken down and sent to each shard,
> will this request make it's way to the process() method of the component
> (because it will look like a non-distributed request to the SearchHandler of
> the shard)?

Yes.

> 2. the comment above the response handling loop (in SearchHandler) says that
> if any requests are added while in the loop, the loop will break and make
> the request immediately. I see that the loop will exit if there is an
> exception or if there are no more responses, but  I don't see how the new
> requests will be called unless it goes through the entire loop again.

Here's the code.
  // now wait for replies, but if anyone puts more requests on
  // the outgoing queue, send them out immediately (by exiting
  // this loop)
  while (rb.outgoing.size() == 0) {
[ receive a response, and process the response ]
  }
If any code processing the response adds another request to the
outgoing queue, then the loop will break and the new outgoing requests
will be sent.

So it's not *quite* immediate... it's after components have processed
the response.

> 3. if one adds a request to rb in the handleResponses method, this wouldn't
> necessarily be called, namely in the event that none of the components
> override the distributedProcess method, and the loop only goes through once.
>
> 4. where can I learn more about the shard.purpose variable? Where in the
> component should this be set, if anywhere?

  public final static int PURPOSE_PRIVATE = 0x01;
  public final static int PURPOSE_GET_TERM_DFS= 0x02;
  public final static int PURPOSE_GET_TOP_IDS = 0x04;
  public final static int PURPOSE_REFINE_TOP_IDS  = 0x08;
  public final static int PURPOSE_GET_FACETS  = 0x10;
  public final static int PURPOSE_REFINE_FACETS   = 0x20;
  public final static int PURPOSE_GET_FIELDS  = 0x40;
  public final static int PURPOSE_GET_HIGHLIGHTS  = 0x80;
  public final static int PURPOSE_GET_DEBUG   =0x100;
  public final static int PURPOSE_GET_STATS   =0x200;

  public int purpose;  // the purpose of this request

It's for declaring what a request is for, so other components can
piggyback on that request if they want and avoid sending a separate
request. For example, the highlighting component chooses to request
highlighting only by piggybacking on requests to retrieve stored
fields.

// Turn on highlighting only only when retrieving fields
if ((sreq.purpose & ShardRequest.PURPOSE_GET_FIELDS) != 0) {
sreq.purpose |= ShardRequest.PURPOSE_GET_HIGHLIGHTS;
// should already be true...
sreq.params.set(HighlightParams.HIGHLIGHT, "true");

The facet component will also look for suitable other outgoing
requests to piggyback on and modify, and if it can't find any, will
create a new request.  See FacetComponent.java:134

Some of these are currently unused - PURPOSE_GET_TERM_DFS for example,
would be for getting the doc freqs to implement a global idf.

-Yonik
http://www.lucidimagination.com


> I've taken a look at the wiki page, but if there is more documentation
> elsewhere please point me towards it.
>
> Thanks in advance,
> Mike


Re: distributed search components

2009-08-21 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 6:35 PM, Mike Anderson wrote:
> I've been trying to dissect the MLT component and understand how it works.
> Every-time I think I have the process figured it out I somehow just end up
> more confused.

I don't think MTL supports distributed search.

http://wiki.apache.org/solr/DistributedSearch

-Yonik
http://www.lucidimagination.com


[jira] Commented: (SOLR-1377) Force TokenizerFactory to create a Tokenizer rather then TokenStream

2009-08-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746311#action_12746311
 ] 

Yonik Seeley commented on SOLR-1377:


bq. For the Pattern implementation, all the tokens are created beforehand and 
are just passed off with iter.next(), so if the input changes, the whole thing 
would need to change. 

And it does now... I moved the creation of the Token to init() so it's 
recreated with every reset.

bq. Any reason not to implement reset on: TrieTokenizerFactory?
TrieTokenizer (right below the factory) already implements reset(Reader).

> Force TokenizerFactory to create a Tokenizer rather then TokenStream 
> -
>
> Key: SOLR-1377
> URL: https://issues.apache.org/jira/browse/SOLR-1377
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Fix For: 1.4
>
> Attachments: SOLR-1377-Tokenizer.patch, SOLR-1377.patch
>
>
> The new token reuse classes require that they are created with a Tokenizer.  
> The solr TokenizerFactory interface currently makes a TokenStream.
> Although this is an API breaking change, the alternative is to just document 
> that it needs to be a Tokenizer instance and throw an error when it is not.
> For more discussion, see:
> http://www.lucidimagination.com/search/document/272b8c4e6198d887/trunk_classcastexception_with_basetokenizerfactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: distributed search components

2009-08-21 Thread Mike Anderson
I've been trying to dissect the MLT component and understand how it  
works. Every-time I think I have the process figured it out I somehow  
just end up more confused. Here is my so far best guess at how the  
process and flow work:


1. request comes in, and is routed to distributed section of  
SearchHandler

2. request is sent to each shard
3. after the shard returns a list of Doc IDs, new MLT requests are  
created, one for each Doc ID. (this happens in responseHandler())
4. each MLT request is processed on the same shard (this happens in  
process())
5. shard returns MLT results, which are collated  (this happens in  
finishedStage())


although I don't think this is quite right because it doesn't match my  
print statements. I also noticed that the Purpose isn't 400 but 401.  
Whats up with this? is 401 a code for something else?


(as an aside, is it unsafe to assume that the logs will appear in  
actual chronological order?)



Any advice or pointers at this point would be greatly appreciated.. I  
think I'm going in circles.



-mike



On Aug 21, 2009, at 12:54 PM, Jason Rutherglen wrote:


Mike,

I'm also finding the Solr distributed process to be confusing.  Lets
try to add things to the wiki as we learn them?

-J

On Fri, Aug 21, 2009 at 9:52 AM, Mike Anderson wrote:

I'm trying to make my way through learning how to modify and write
distributed search components.

A few questions

1. in SearchHandler, when the query is broken down and sent to each  
shard,
will this request make it's way to the process() method of the  
component
(because it will look like a non-distributed request to the  
SearchHandler of

the shard)?

2. the comment above the response handling loop (in SearchHandler)  
says that
if any requests are added while in the loop, the loop will break  
and make

the request immediately. I see that the loop will exit if there is an
exception or if there are no more responses, but  I don't see how  
the new

requests will be called unless it goes through the entire loop again.

3. if one adds a request to rb in the handleResponses method, this  
wouldn't
necessarily be called, namely in the event that none of the  
components
override the distributedProcess method, and the loop only goes  
through once.


4. where can I learn more about the shard.purpose variable? Where  
in the

component should this be set, if anywhere?


I've taken a look at the wiki page, but if there is more  
documentation

elsewhere please point me towards it.

Thanks in advance,
Mike






[jira] Commented: (SOLR-1377) Force TokenizerFactory to create a Tokenizer rather then TokenStream

2009-08-21 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746287#action_12746287
 ] 

Ryan McKinley commented on SOLR-1377:
-

Is reset gaurenteed to be called on the same Reader?  For the Pattern 
implementation, all the tokens are created beforehand and are just passed off 
with iter.next(), so if the input changes, the whole thing would need to change.

+   public void reset(Reader input) throws IOException {
+  super.reset(input);
+  init();
+   }

Any reason not to implement reset on: TrieTokenizerFactory?

> Force TokenizerFactory to create a Tokenizer rather then TokenStream 
> -
>
> Key: SOLR-1377
> URL: https://issues.apache.org/jira/browse/SOLR-1377
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Fix For: 1.4
>
> Attachments: SOLR-1377-Tokenizer.patch, SOLR-1377.patch
>
>
> The new token reuse classes require that they are created with a Tokenizer.  
> The solr TokenizerFactory interface currently makes a TokenStream.
> Although this is an API breaking change, the alternative is to just document 
> that it needs to be a Tokenizer instance and throw an error when it is not.
> For more discussion, see:
> http://www.lucidimagination.com/search/document/272b8c4e6198d887/trunk_classcastexception_with_basetokenizerfactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1377) Force TokenizerFactory to create a Tokenizer rather then TokenStream

2009-08-21 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1377:
---

Attachment: SOLR-1377.patch

Uploading another patch based on yours that implements reuse (reset(Reader)) 
for the Tokenizers.

+1

> Force TokenizerFactory to create a Tokenizer rather then TokenStream 
> -
>
> Key: SOLR-1377
> URL: https://issues.apache.org/jira/browse/SOLR-1377
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Fix For: 1.4
>
> Attachments: SOLR-1377-Tokenizer.patch, SOLR-1377.patch
>
>
> The new token reuse classes require that they are created with a Tokenizer.  
> The solr TokenizerFactory interface currently makes a TokenStream.
> Although this is an API breaking change, the alternative is to just document 
> that it needs to be a Tokenizer instance and throw an error when it is not.
> For more discussion, see:
> http://www.lucidimagination.com/search/document/272b8c4e6198d887/trunk_classcastexception_with_basetokenizerfactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1163) Solr Explorer - A generic GWT client for Solr

2009-08-21 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746261#action_12746261
 ] 

Lance Norskog commented on SOLR-1163:
-

Does a GWT client application have a clean license? Are there any other GWT 
apps in the Apache project?

+1. This is great.

The [Simile|http://simile.mit.edu/] project has some nice data explorer UIs. 
The [Simile-Widget|http://www.simile-widgets.org/] gallery displays them.


> Solr Explorer - A generic GWT client for Solr
> -
>
> Key: SOLR-1163
> URL: https://issues.apache.org/jira/browse/SOLR-1163
> Project: Solr
>  Issue Type: New Feature
>  Components: web gui
>Affects Versions: 1.3
>Reporter: Uri Boness
> Attachments: graphics.zip, solr-explorer.patch, solr-explorer.patch
>
>
> The attached patch is a GWT generic client for solr. It is currently 
> standalone, meaning that once built, one can open the generated HTML file in 
> a browser and communicate with any deployed solr. It is configured with it's 
> own configuration file, where one can configure the solr instance/core to 
> connect to. Since it's currently standalone and completely client side based, 
> it uses JSON with padding (cross-side scripting) to connect to remote solr 
> servers. Some of the supported features:
> - Simple query search
> - Sorting - one can dynamically define new sort criterias
> - Search results are rendered very much like Google search results are 
> rendered. It is also possible to view all stored field values for every hit. 
> - Custom hit rendering - It is possible to show thumbnails (images) per hit 
> and also customize a view for a hit based on html templates
> - Faceting - one can dynamically define field and query facets via the UI. it 
> is also possible to pre-configure these facets in the configuration file.
> - Highlighting - you can dynamically configure highlighting. it can also be 
> pre-configured in the configuration file
> - Spellchecking - you can dynamically configure spell checking. Can also be 
> done in the configuration file. Supports collation. It is also possible to 
> send "build" and "reload" commands.
> - Data import handler - if used, it is possible to send a "full-import" and 
> "status" command ("delta-import" is not implemented yet, but it's easy to add)
> - Console - For development time, there's a small console which can help to 
> better understand what's going on behind the scenes. One can use it to:
> ** view the client logs
> ** browse the solr scheme
> ** View a break down of the current search context
> ** View a break down of the query URL that is sent to solr
> ** View the raw JSON response returning from Solr
> This client is actually a platform that can be greatly extended for more 
> things. The goal is to have a client where the explorer part is just one view 
> of it. Other future views include: Monitoring, Administration, Query Builder, 
> DataImportHandler configuration, and more...
> To get a better view of what's currently possible. We've set up a public 
> version of this client at: http://search.jteam.nl/explorer. This client is 
> configured with one solr instance where crawled YouTube movies where indexed. 
> You can also check out a screencast for this deployed client: 
> http://search.jteam.nl/help
> The patch created a new folder in the contrib. directory. Since the patch 
> doesn't contain binaries, an additional zip file is provides that needs to be 
> extract to add all the required graphics. This module is maven2 based and is 
> configured in such a way that all GWT related tools/libraries are 
> automatically downloaded when the modules is compiled. One of the artifacts 
> of the build is a war file which can be deployed in any servlet container.
> NOTE: this client works best on WebKit based browsers (for performance 
> reason) but also works on firefox and ie 7+. That said, it should be taken 
> into account that it is still under development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1378) Add reference to Packt's Solr book.

2009-08-21 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-1378:
---

Attachment: solr-book-image.jpg
solr_book_packt.patch

The image goes here: 
src\site\src\documentation\content\xdocs\images\solr-book-image.jpg

> Add reference to Packt's Solr book.
> ---
>
> Key: SOLR-1378
> URL: https://issues.apache.org/jira/browse/SOLR-1378
> Project: Solr
>  Issue Type: Task
>Reporter: David Smiley
> Attachments: solr-book-image.jpg, solr_book_packt.patch
>
>
> I've attached news of the Solr update.  It includes an image under the left 
> nav area, and a news item with the same image.  The text is as follows:
> David Smiley and Eric Pugh are proud to introduce the first book on Solr, 
> "Solr 1.4 Enterprise Search Server" from Packt Publishing.
> This book is a comprehensive reference guide for nearly every feature Solr 
> has to offer. It serves the reader right from initiation to development to 
> deployment. It also comes with complete running examples to demonstrate its 
> use and show how to integrate it with other languages and frameworks.
> To keep this interesting and realistic, it uses a large open source set of 
> metadata about artists, releases, and tracks courtesy of the MusicBrainz.org 
> project. Using this data as a testing ground for Solr, you will learn how to 
> import this data in various ways from CSV to XML to database access. You will 
> then learn how to search this data in a myriad of ways, including Solr's rich 
> query syntax, "boosting" match scores based on record data and other means, 
> about searching across multiple fields with different boosts, getting facets 
> on the results, auto-complete user queries, spell-correcting searches, 
> highlighting queried text in search results, and so on.
> After this thorough tour, you'll see working examples of integrating a 
> variety of technologies with Solr such as Java, JavaScript, Drupal, Ruby, 
> PHP, and Python.
> Finally, this book covers various deployment considerations to include 
> indexing strategies and performance-oriented configuration that will enable 
> you to scale Solr to meet the needs of a high-volume site. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1377) Force TokenizerFactory to create a Tokenizer rather then TokenStream

2009-08-21 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-1377:


Attachment: SOLR-1377-Tokenizer.patch

Here is a patch that:

1. Changes the TokenizerFactory to return a Tokenizer
2. Updates all TokenizerFactory classes to explicitly return a Tokenizer
3. Changes the PatternTokenizerFactory to return a Tokenizer
4. adds a test that calls PatternTokenizer

- - -

Since this is an API breaking change, I added this to the "Upgrading from Solr 
1.3" section of CHANGES.txt:
{panel}
The TokenizerFactory API has changed to explicitly return a Tokenizer rather 
then
a TokenStream (that may be or may not be a Tokenizer).  This change is required
to take advantage of the Token reuse improvements in lucene 2.9.  For more 
information, see SOLR-1377. 
{panel}

I'll wait for two +1 votes on this, since it does break back compatibility


> Force TokenizerFactory to create a Tokenizer rather then TokenStream 
> -
>
> Key: SOLR-1377
> URL: https://issues.apache.org/jira/browse/SOLR-1377
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Fix For: 1.4
>
> Attachments: SOLR-1377-Tokenizer.patch
>
>
> The new token reuse classes require that they are created with a Tokenizer.  
> The solr TokenizerFactory interface currently makes a TokenStream.
> Although this is an API breaking change, the alternative is to just document 
> that it needs to be a Tokenizer instance and throw an error when it is not.
> For more discussion, see:
> http://www.lucidimagination.com/search/document/272b8c4e6198d887/trunk_classcastexception_with_basetokenizerfactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1378) Add reference to Packt's Solr book.

2009-08-21 Thread David Smiley (JIRA)
Add reference to Packt's Solr book.
---

 Key: SOLR-1378
 URL: https://issues.apache.org/jira/browse/SOLR-1378
 Project: Solr
  Issue Type: Task
Reporter: David Smiley


I've attached news of the Solr update.  It includes an image under the left nav 
area, and a news item with the same image.  The text is as follows:

David Smiley and Eric Pugh are proud to introduce the first book on Solr, "Solr 
1.4 Enterprise Search Server" from Packt Publishing.

This book is a comprehensive reference guide for nearly every feature Solr has 
to offer. It serves the reader right from initiation to development to 
deployment. It also comes with complete running examples to demonstrate its use 
and show how to integrate it with other languages and frameworks.

To keep this interesting and realistic, it uses a large open source set of 
metadata about artists, releases, and tracks courtesy of the MusicBrainz.org 
project. Using this data as a testing ground for Solr, you will learn how to 
import this data in various ways from CSV to XML to database access. You will 
then learn how to search this data in a myriad of ways, including Solr's rich 
query syntax, "boosting" match scores based on record data and other means, 
about searching across multiple fields with different boosts, getting facets on 
the results, auto-complete user queries, spell-correcting searches, 
highlighting queried text in search results, and so on.

After this thorough tour, you'll see working examples of integrating a variety 
of technologies with Solr such as Java, JavaScript, Drupal, Ruby, PHP, and 
Python.

Finally, this book covers various deployment considerations to include indexing 
strategies and performance-oriented configuration that will enable you to scale 
Solr to meet the needs of a high-volume site. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1375) BloomFilter on a field

2009-08-21 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1375:
---

Attachment: SOLR-1375.patch

* The Hadoop BloomFilter code is included in the patch



> BloomFilter on a field
> --
>
> Key: SOLR-1375
> URL: https://issues.apache.org/jira/browse/SOLR-1375
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1375.patch, SOLR-1375.patch
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> * A bloom filter is a read only probabilistic set. Its useful
> for verifying a key exists in a set, though it returns false
> positives. http://en.wikipedia.org/wiki/Bloom_filter 
> * The use case is indexing in Hadoop and checking for duplicates
> against a Solr cluster (which when using term dictionary or a
> query) is too slow and exceeds the time consumed for indexing.
> When a match is found, the host, segment, and term are returned.
> If the same term is found on multiple servers, multiple results
> are returned by the distributed process. (We'll need to add in
> the core name I just realized). 
> * When new segments are created, and commit is called, a new
> bloom filter is generated from a given field (default:id) by
> iterating over the term dictionary values. There's a bloom
> filter file per segment, which is managed on each Solr shard.
> When segments are merged away, their corresponding .blm files is
> also removed. In a future version we'll have a central server
> for the bloom filters so we're not abusing the thread pool of
> the Solr proxy and the networking of the Solr cluster (this will
> be done sooner than later after testing this version). I held
> off because the central server requires syncing the Solr
> servers' files (which is like reverse replication). 
> * The patch uses the BloomFilter from Hadoop 0.20. I want to jar
> up only the necessary classes so we don't have a giant Hadoop
> jar in lib.
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/bloom/BloomFilter.html
> * Distributed code is added and seems to work, I extended
> TestDistributedSearch to test over multiple HTTP servers. I
> chose this approach rather than the manual method used by (for
> example) TermVectorComponent.testDistributed because I'm new to
> Solr's distributed search and wanted to learn how it works (the
> stages are confusing). Using this method, I didn't need to setup
> multiple tomcat servers and manually execute tests.
> * We need more of the bloom filter options passable via
> solrconfig
> * I'll add more test cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1377) Force TokenizerFactory to create a Tokenizer rather then TokenStream

2009-08-21 Thread Ryan McKinley (JIRA)
Force TokenizerFactory to create a Tokenizer rather then TokenStream 
-

 Key: SOLR-1377
 URL: https://issues.apache.org/jira/browse/SOLR-1377
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 1.4


The new token reuse classes require that they are created with a Tokenizer.  
The solr TokenizerFactory interface currently makes a TokenStream.

Although this is an API breaking change, the alternative is to just document 
that it needs to be a Tokenizer instance and throw an error when it is not.

For more discussion, see:
http://www.lucidimagination.com/search/document/272b8c4e6198d887/trunk_classcastexception_with_basetokenizerfactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: distributed search components

2009-08-21 Thread Jason Rutherglen
Mike,

I'm also finding the Solr distributed process to be confusing.  Lets
try to add things to the wiki as we learn them?

-J

On Fri, Aug 21, 2009 at 9:52 AM, Mike Anderson wrote:
> I'm trying to make my way through learning how to modify and write
> distributed search components.
>
> A few questions
>
> 1. in SearchHandler, when the query is broken down and sent to each shard,
> will this request make it's way to the process() method of the component
> (because it will look like a non-distributed request to the SearchHandler of
> the shard)?
>
> 2. the comment above the response handling loop (in SearchHandler) says that
> if any requests are added while in the loop, the loop will break and make
> the request immediately. I see that the loop will exit if there is an
> exception or if there are no more responses, but  I don't see how the new
> requests will be called unless it goes through the entire loop again.
>
> 3. if one adds a request to rb in the handleResponses method, this wouldn't
> necessarily be called, namely in the event that none of the components
> override the distributedProcess method, and the loop only goes through once.
>
> 4. where can I learn more about the shard.purpose variable? Where in the
> component should this be set, if anywhere?
>
>
> I've taken a look at the wiki page, but if there is more documentation
> elsewhere please point me towards it.
>
> Thanks in advance,
> Mike
>
>


distributed search components

2009-08-21 Thread Mike Anderson
I'm trying to make my way through learning how to modify and write  
distributed search components.


A few questions

1. in SearchHandler, when the query is broken down and sent to each  
shard, will this request make it's way to the process() method of the  
component (because it will look like a non-distributed request to the  
SearchHandler of the shard)?


2. the comment above the response handling loop (in SearchHandler)  
says that if any requests are added while in the loop, the loop will  
break and make the request immediately. I see that the loop will exit  
if there is an exception or if there are no more responses, but  I  
don't see how the new requests will be called unless it goes through  
the entire loop again.


3. if one adds a request to rb in the handleResponses method, this  
wouldn't necessarily be called, namely in the event that none of the  
components override the distributedProcess method, and the loop only  
goes through once.


4. where can I learn more about the shard.purpose variable? Where in  
the component should this be set, if anywhere?



I've taken a look at the wiki page, but if there is more documentation  
elsewhere please point me towards it.


Thanks in advance,
Mike



Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Mark Miller
Ryan McKinley wrote:
> Ahh, I see:
>  Tokenizer extends TokenStream
>
> So if this is going to break everything that implements TokenStream
> rather then Tokenizer, it seems we should change the TokenizerFactory
> API from:
>   public Tokenizer create( Reader input )
> rather then:
>   public TokenStream create( Reader input );
>
> I would WAY rather have my compiler tell me something is wrong then
> get an error and then find some documentation about the tokenizer.
>
> - - - - -
>
> Personally, I think lucene/solr just need to fess up and admit that
> 2.9 is *not* totally back compatible.  
I don't think anyone contends that Lucene is totally backcompat - and
insofarasthatgoes there is no way Solr totally is - . it exposes a lot
of Lucene.

We admit our breaks in this release in the back compat breaks section.
There is no way we will release claiming total back compat. Not even in
the realm of possibility.
> No way is the Multireader change back-compatible!

Personally, pure API wise - I think it was. Its a stickier issue on the
possible more RAM usage - but too me, thats more of a Runtime change.
Certain methods have always changed over time in their resource usage,
and I think thats within back compat. This was a steep one to swallow
though, I'll admit. Basically we just thought it was way worth it long
term. And Hoss came up with some great ideas to help ease the possible pain.
>
> ryan
>
>
> On Aug 21, 2009, at 11:39 AM, Yonik Seeley wrote:
>
>> On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinley
>> wrote:
>>> I'm fine upgrading, but it seems we should the 'back compatibility'
>>> notice more explicit.
>>
>> Yeah... that should be fun for expert-use plugins in general.  In
>> Lucene-land, this is the release of the "break"... I think we've
>> covered the changes reasonably well in our external APIs, but people
>> can always use pretty much the full Lucene API when writing Solr
>> plugins.
>>
>> I think we'll need to document that things in  tags need to
>> inherit from Tokenizer classes.  It is technically a back-compat
>> break, but I assume it will affect very few users?
>>
>> -Yonik
>> http://www.lucidimagination.com
>


-- 
- Mark

http://www.lucidimagination.com





Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 12:22 PM, Ryan McKinley wrote:
> Ahh, I see:
>  Tokenizer extends TokenStream
>
> So if this is going to break everything that implements TokenStream rather
> then Tokenizer, it seems we should change the TokenizerFactory API from:
>  public Tokenizer create( Reader input )
> rather then:
>  public TokenStream create( Reader input );
>
> I would WAY rather have my compiler tell me something is wrong then get an
> error and then find some documentation about the tokenizer.

+1
Absolutely.

-Yonik
http://www.lucidimagination.com


Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Ryan McKinley

Ahh, I see:
 Tokenizer extends TokenStream

So if this is going to break everything that implements TokenStream  
rather then Tokenizer, it seems we should change the TokenizerFactory  
API from:

  public Tokenizer create( Reader input )
rather then:
  public TokenStream create( Reader input );

I would WAY rather have my compiler tell me something is wrong then  
get an error and then find some documentation about the tokenizer.


- - - - -

Personally, I think lucene/solr just need to fess up and admit that  
2.9 is *not* totally back compatible.  No way is the Multireader  
change back-compatible!


ryan


On Aug 21, 2009, at 11:39 AM, Yonik Seeley wrote:

On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinley  
wrote:

I'm fine upgrading, but it seems we should the 'back compatibility'
notice more explicit.


Yeah... that should be fun for expert-use plugins in general.  In
Lucene-land, this is the release of the "break"... I think we've
covered the changes reasonably well in our external APIs, but people
can always use pretty much the full Lucene API when writing Solr
plugins.

I think we'll need to document that things in  tags need to
inherit from Tokenizer classes.  It is technically a back-compat
break, but I assume it will affect very few users?

-Yonik
http://www.lucidimagination.com




[jira] Commented: (SOLR-1376) invalid links to solr indexes after a new index is created

2009-08-21 Thread kiran sugana (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746035#action_12746035
 ] 

kiran sugana commented on SOLR-1376:


Hi Hoss, 

By incremental indexing I meant commits,I do not know for sure list of deleted 
files grows over the time. we noticed the issue when solr became slow or 
unresponsive on the machine, which solr has not been restarted for a while. I 
will investigate if the deleted files list is growing. 
Kiran  

> invalid links to solr indexes after a new index is created
> --
>
> Key: SOLR-1376
> URL: https://issues.apache.org/jira/browse/SOLR-1376
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: kiran sugana
> Fix For: 1.4
>
>
> After new index is created, it does not delete the links to the old indexes, 
> To recreate the issue, 
> 1) do a incremental indexing 
> 2) cd /proc/[JAVA_PID]/fd
> 3) ls -la
> {code}
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 75 -> 
> /home//solrhome/data/index/_kja.fdx (deleted)
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 76 -> 
> /home/./solrhome/data/index/_kk4.tis (deleted)
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 78 -> 
> /home//solrhome/data/index/_kk4.frq (deleted)
> lr-x-- 1 solr roleusers 64 Jul 23 17:31 79 -> 
> /home//solrhome/data/index/_kk4.prx (deleted)
> {code}
> This is creating performance issues, (search slows down significantly) 
> Temp Resolution:
>  Restart solr

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinley wrote:
> I'm fine upgrading, but it seems we should the 'back compatibility'
> notice more explicit.

Yeah... that should be fun for expert-use plugins in general.  In
Lucene-land, this is the release of the "break"... I think we've
covered the changes reasonably well in our external APIs, but people
can always use pretty much the full Lucene API when writing Solr
plugins.

I think we'll need to document that things in  tags need to
inherit from Tokenizer classes.  It is technically a back-compat
break, but I assume it will affect very few users?

-Yonik
http://www.lucidimagination.com


Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Ryan McKinley


On Aug 21, 2009, at 10:49 AM, Yonik Seeley wrote:

On Fri, Aug 21, 2009 at 10:33 AM, Ryan McKinley  
wrote:

Actually I think there may be something wrong here.

BaseTokenizerFactory does not make a Tokenizer, it creates a
TokenStream, so it should never be cast to Tokenizer

My custom TokenizerFactory now looks the same as:
o.a.s.analysis.PatternTokenizerFactory


Urg... looks like there's no end-to-end (index then search) test for
PatternTokenizerFactory, so we never caught this.


I guess we need to add one :)



It seems like when something is specified as a  in
schema.xml it should in fact be a tokenizer - it's the only way
tokenstream reuse works.



I don't see anything in Solr that creates a Tokenizer.  The  
TokenizerFactory just creates a TokenStream.


It seems that TokenizerFactory really needs to be:
  public Tokenizer create( Reader input )
rather then:
  public TokenStream create( Reader input );

I don't see any backwards compatible way to make this change!

ideas?
ryan


Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Yonik Seeley
On Fri, Aug 21, 2009 at 10:33 AM, Ryan McKinley wrote:
> Actually I think there may be something wrong here.
>
> BaseTokenizerFactory does not make a Tokenizer, it creates a
> TokenStream, so it should never be cast to Tokenizer
>
> My custom TokenizerFactory now looks the same as:
> o.a.s.analysis.PatternTokenizerFactory

Urg... looks like there's no end-to-end (index then search) test for
PatternTokenizerFactory, so we never caught this.

It seems like when something is specified as a  in
schema.xml it should in fact be a tokenizer - it's the only way
tokenstream reuse works.

-Yonik


Re: /trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Ryan McKinley
Actually I think there may be something wrong here.

BaseTokenizerFactory does not make a Tokenizer, it creates a
TokenStream, so it should never be cast to Tokenizer

My custom TokenizerFactory now looks the same as:
o.a.s.analysis.PatternTokenizerFactory

Not sure what to look at next...  ideas?

thanks
ryan


On Fri, Aug 21, 2009 at 10:13 AM, Ryan McKinley wrote:
> Just updated to /trunk and am now seeing this exception:
>
> Caused by: org.apache.solr.client.solrj.SolrServerException:
> java.lang.ClassCastException:
> xxx.solr.analysis.JSONKeyValueTokenizerFactory$1 cannot be cast to
> org.apache.lucene.analysis.Tokenizer
>        at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:141)
>        ... 15 more
> Caused by: java.lang.ClassCastException:
> xxx.solr.analysis.JSONKeyValueTokenizerFactory$1 cannot be cast to
> org.apache.lucene.analysis.Tokenizer
>        at 
> org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:69)
>        at 
> org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74)
>        at 
> org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:364)
>        at 
> org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:124)
>        at 
> org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
>        at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
>
>
> Looks like SolrIndexAnalyzer now assumes everything uses the new
> TokenStream API...
>
> I'm fine upgrading, but it seems we should the 'back compatibility'
> notice more explicit.
>
>
> FYI, this is what the TokenizerFactory looks like:
>
> public class JSONKeyValueTokenizerFactory extends BaseTokenizerFactory
> {
>  ...
>
>  public TokenStream create(Reader input) {
>    final JSONParser js = new JSONParser( input );
>    final Stack keystack = new Stack();
>
>    return new TokenStream()
>    {
>      ...
>


/trunk ClassCastException with BaseTokenizerFactory

2009-08-21 Thread Ryan McKinley
Just updated to /trunk and am now seeing this exception:

Caused by: org.apache.solr.client.solrj.SolrServerException:
java.lang.ClassCastException:
xxx.solr.analysis.JSONKeyValueTokenizerFactory$1 cannot be cast to
org.apache.lucene.analysis.Tokenizer
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:141)
... 15 more
Caused by: java.lang.ClassCastException:
xxx.solr.analysis.JSONKeyValueTokenizerFactory$1 cannot be cast to
org.apache.lucene.analysis.Tokenizer
at 
org.apache.solr.analysis.TokenizerChain.getStream(TokenizerChain.java:69)
at 
org.apache.solr.analysis.SolrAnalyzer.reusableTokenStream(SolrAnalyzer.java:74)
at 
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:364)
at 
org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:124)
at 
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)


Looks like SolrIndexAnalyzer now assumes everything uses the new
TokenStream API...

I'm fine upgrading, but it seems we should the 'back compatibility'
notice more explicit.


FYI, this is what the TokenizerFactory looks like:

public class JSONKeyValueTokenizerFactory extends BaseTokenizerFactory
{
  ...

  public TokenStream create(Reader input) {
final JSONParser js = new JSONParser( input );
final Stack keystack = new Stack();

return new TokenStream()
{
  ...


[jira] Updated: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-21 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1275:
---

Attachment: SOLR-1275.patch

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch, SOLR-1275.patch, SOLR-1275.patch, 
> SOLR-1275.patch, SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-21 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745952#action_12745952
 ] 

Yonik Seeley commented on SOLR-1275:


bq. Calling SR.undelete would remove the deletes and the test would pass?

Simple to fix... check against the exact number of documents instead of 
checking that there are no deletes.


> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch, SOLR-1275.patch, SOLR-1275.patch, 
> SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1335) load core properties from a properties file

2009-08-21 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1335:
-

Attachment: SOLR-1335.patch

* The properties filename is configurable from solr.xml on a per-core basis
* The testcase is cleaned up 

> load core properties from a properties file
> ---
>
> Key: SOLR-1335
> URL: https://issues.apache.org/jira/browse/SOLR-1335
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1335.patch, SOLR-1335.patch, SOLR-1335.patch, 
> SOLR-1335.patch
>
>
> There are  few ways of loading properties in runtime,
> # using env property using in the command line
> # if you use a multicore drop it in the solr.xml
> if not , the only way is to  keep separate solrconfig.xml for each instance.  
> #1 is error prone if the user fails to start with the correct system 
> property. 
> In our case we have four different configurations for the same deployment  . 
> And we have to disable replication of solrconfig.xml. 
> It would be nice if I can distribute four properties file so that our ops can 
> drop  the right one and start Solr. Or it is possible for the operations to 
> edit a properties file  but it is risky to edit solrconfig.xml if he does not 
> understand solr
> I propose a properties file in the instancedir as solrcore.properties . If 
> present would be loaded and added as core specific properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (SOLR-1366) UnsupportedOperationException may be thrown when using custom IndexReader

2009-08-21 Thread Shalin Shekhar Mangar
On Fri, Aug 21, 2009 at 3:20 AM, Chris Hostetter
wrote:

>
> : Shalin Shekhar Mangar updated SOLR-1366:
> : 
> :
> : Component/s: replication (java)
>
> the issue seems broader then just replication ... i would change this back
> to a generic "search" component, and open new related issue(s) for
> replication (documentation vs custom reader support) ... some pieces of
> this may make it into 1.4 and some may not, so we'll want to track
> seperately.
>
>
I just added "replication" in addition to "search" to the components field
so that this issue shows up against both. I'll open a new issue for the
documentation updates.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Deleting a field at runtime

2009-08-21 Thread Toby Cole
It depends on which level you want to delete it from. If you just want  
solr to know nothing about the field, then you can remove it from the  
schema and reload the core (or restart solr).
Technically the field will still exist in the lucene index, but if  
you're only accessing that index through solr, it will effectively not  
exist any more. (I think).

T

On 20 Aug 2009, at 18:20, KishoreVeleti CoreObjects wrote:



Hi All,

Just completed an interview on SOLR - one of the question was "is it
possible to remove a field from existing index". I am not sure what  
is the

business use case here.

My understanding is it is not possible. Still wanted to know from SOLR
experts, is it possible to remove a field from an existing index?

Thanks in Advance,
Kishore Veleti A.V.K.
--
View this message in context: 
http://www.nabble.com/Deleting-a-field-at-runtime-tp25066329p25066329.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



--
Toby Cole
Software Engineer, Semantico Limited
Registered in England and Wales no. 03841410, VAT no. GB-744614334.
Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.

Check out all our latest news and thinking on the Discovery blog
http://blogs.semantico.com/discovery-blog/



Build failed in Hudson: Solr-trunk #901

2009-08-21 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/901/changes

Changes:

[yonik] SOLR-1368: add ms() and sub() functions

[hossman] cleanup of comments relating to 'default' field values; cleanup of 
'timestamp' usage examples -- switched to using 'manufacturedate_dt' as a 
generic date field example since yonik doens't want schema to have fields with 
default values uncommented

[hossman] remove executable bit from csv file

[hossman] SOLR-1373: Add Filter query to admin/form.jsp

[hossman] SOLR-1371: LukeRequestHandler/schema.jsp errored if schema had no 
uniqueKey field. The new test for this also (hopefully) adds some future 
proofing against similar bugs in the future.  As a side effect 
QueryElevationComponentTest was refactored, and a bug in that test was found.

--
[...truncated 2204 lines...]
[junit] Running org.apache.solr.analysis.TestStopFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.057 sec
[junit] Running org.apache.solr.analysis.TestSynonymFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 6.419 sec
[junit] Running org.apache.solr.analysis.TestSynonymMap
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 5.7 sec
[junit] Running org.apache.solr.analysis.TestTrimFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.013 sec
[junit] Running org.apache.solr.analysis.TestWordDelimiterFilter
[junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 36.385 sec
[junit] Running org.apache.solr.client.solrj.SolrExceptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.11 sec
[junit] Running org.apache.solr.client.solrj.SolrQueryTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.664 sec
[junit] Running org.apache.solr.client.solrj.TestBatchUpdate
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 23.244 sec
[junit] Running org.apache.solr.client.solrj.TestLBHttpSolrServer
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 17.019 sec
[junit] Running org.apache.solr.client.solrj.beans.TestDocumentObjectBinder
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.513 sec
[junit] Running org.apache.solr.client.solrj.embedded.JettyWebappTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 18.176 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeBinaryJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 11.644 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.605 sec
[junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 11.11 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.239 sec
[junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.156 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.541 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest
[junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 19.198 sec
[junit] Test org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest 
FAILED
[junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
[junit] 
[junit] ERRORunknown_field_timestamp
[junit] 
[junit] request: http://localhost:60348/example/update?wt=javabin&version=1)
[junit] Tests run: 9, Failures: 0, Errors: 1, Time elapsed: 26.718 sec
[junit] Test org.apache.solr.client.solrj.embedded.SolrExampleJettyTest 
FAILED
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
[junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 38.089 sec
[junit] Test org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest 
FAILED
[junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.263 sec
[junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.469 sec
[junit] Running 
org.apache.solr.client.solrj.response.AnlysisResponseBaseTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.963 sec
[junit] Running 
org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.623 sec
[junit] Running 
org.apache.solr.client.solrj.response.FieldAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time e

Solr nightly build failure

2009-08-21 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web

compile-solrj:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj
[javac] Compiling 84 source files to /tmp/apache-solr-nightly/build/solrj
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solr
[javac] Compiling 372 source files to /tmp/apache-solr-nightly/build/solr
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 166 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 19, Failures: 0, Errors: 0, Time elapsed: 43.438 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 22.641 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 21.906 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 19.662 sec
[junit] Running org.apache.solr.MinimalSchemaTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 12.82 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 6.85 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 6.033 sec
[junit] Running org.apache.solr.SolrInfoMBeanTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.165 sec
[junit] Running org.apache.solr.TestDistributedSearch
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 107.777 sec
[junit] Running org.apache.solr.TestTrie
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 17.618 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.711 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.734 sec
[junit] Running org.apache.solr.analysis.EnglishPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.808 sec
[junit] Running org.apache.solr.analysis.HTMLStripCharFilterTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 1.389 sec
[junit] Running org.apache.solr.analysis.LengthFilterTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.57 sec
[junit] Running org.apache.solr.analysis.SnowballPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.925 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.503 sec
[junit] Running org.apache.solr.analysis.TestCapitalizationFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.647 sec
[junit] Running 
org.apache.solr.analysis.TestDelimitedPayloadTokenFilterFactory
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 8.684 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.581 sec
[junit] Running org.apache.solr.analysis.TestKeepFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.251 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.714 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.786 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 5.998 sec
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.164 sec
[junit] Running org.apache.solr.analysis.Te