date:20081002

You could also use the CoreContainer to create a Core from the  
descriptor:


CoreContainer container = new CoreContainer();
CoreDescriptor descriptor = new CoreDescriptor(container,
core1, /Users/erik/apache-solr-1.3.0/example/solr);
SolrCore core = container.create( descriptor );

if you are using a custom solrconfig name, you would need to call  
setConfigName( path ) on the descriptor.


As for closing...  have you tried core.close()?

ryan


On Oct 2, 2008, at 8:49 AM, Erik Hatcher wrote:

I'm doing some Java experiments to get ready for a solr-ruby  
overhaul such that JRuby comes into play nicely so that  
EmbeddedSolrServer can be used transparently too.  I've not tried  
this since the whole CoreContainer/CoreDescriptor stuff was added,  
and I don't quite understand it all.  Here's what I've got:


 public static void main(String[] args) throws IOException,  
ParserConfigurationException, SAXException, SolrServerException {

   CoreContainer container = new CoreContainer();
   SolrConfig config = new SolrConfig(/Users/erik/apache-solr-1.3.0/ 
example/solr, solrconfig.xml, null);
   CoreDescriptor descriptor = new CoreDescriptor(container,  
core1, /Users/erik/apache-solr-1.3.0/example/solr);
   SolrCore core = new SolrCore(core1, /Users/erik/apache- 
solr-1.3.0/example/solr/data, config, null, descriptor);

   container.register(core1, core, false);
   SolrServer solr = new EmbeddedSolrServer(container, core1);
   SolrQuery query = new SolrQuery(*:*);
   QueryResponse response = solr.query(query);
   System.out.println(response =  + response);
 }

This works, but has a fair bit of seemingly unnecessary duplication,  
and it also leaves the JVM stays running for some reason.


Is this the proper way to use EmbeddedSolrServer, or are there some  
tips to improving the code and reducing the duplication?


Also, why does the JVM keep running?  Are we spinning off a thread  
that needs to be shut down?  Is there some sort of close() call that  
is needed?


Thanks,
Erik

[jira] Resolved: (SOLR-796) remove unused SolrIndexSearcher from DUH2

2008-10-02 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley resolved SOLR-796.


Resolution: Fixed

 remove unused SolrIndexSearcher from DUH2
 -

 Key: SOLR-796
 URL: https://issues.apache.org/jira/browse/SOLR-796
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-796-remove-searcher.patch


 Since the DUH2 does not use the searcher for deletes anymore, it does not 
 need to be able to...
 Check: http://www.nabble.com/Fwd%3A-read-only-SolrCore--td19769173.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: What is a Standard SearchComponent? (SOLR-680)



On Oct 1, 2008, at 5:33 PM, Ryan McKinley wrote:
I disagree with Erik that we should have people explicitly  
configure the components.


Folks don't have to explicitly configure them, if they are just  
running with the example configuration - which is more likely than  
not.


Oh, another thing about search components, I don't like the first/ 
last thing - I like it to be explicit, less magic.


As for component registration precedence, it is the configured  
Component that has precedence.  The Component initialization code  
only adds the default Component if that name is not already used.   
Registering your own spellcheck Component will use your component.


Right, but what if someone has a stats component now in 1.3 wired  
into a custom request handler (but not /select), then upgrades to  
1.4 with the new implicit stats built-in - then all of a sudden a  
request to /select will use _their_ stats component, not the built  
in one.  Right?



Ok, I see you point...  since we are past 1.3, this may a moot point,  
but how about something like:


* SearchHandler has no components registered and must be configured  
manually.
* StandardRequestHandler (currently nothing more then extends  
SearchHandler) would register all components with no dependancies - it  
would not support things like first/last components.


Users extending SearchHandler would have absolute control -- users  
extending StandardRequestHandler would have standard configuration -  
features may be added between major releases, but not removed.


ryan

[jira] Updated: (SOLR-433) MultiCore and SpellChecker replication

[
https://issues.apache.org/jira/browse/SOLR-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Lee updated SOLR-433:
--

Attachment: SOLR-433.patch

Includes Stephane's fixes for snappuller snapinstaller and some minor edits

MultiCore and SpellChecker replication
--

Key: SOLR-433
URL: https://issues.apache.org/jira/browse/SOLR-433
Project: Solr
Issue Type: Improvement
Components: replication, spellchecker
Affects Versions: 1.3
Reporter: Otis Gospodnetic
Fix For: 1.4

Attachments: RunExecutableListener.patch, SOLR-433-r698590.patch,
SOLR-433.patch, SOLR-433.patch, solr-433.patch, SOLR-433_unified.patch,
spellindexfix.patch

With MultiCore functionality coming along, it looks like we'll need to be
able to:
A) snapshot each core's index directory, and
B) replicate any and all cores' complete data directories, not just their
index directories.
Pulled from the spellchecker and multi-core index replication thread -
http://markmail.org/message/pj2rjzegifd6zm7m
Otis:
I think that makes sense - distribute everything for a given core, not just
its index. And the spellchecker could then also have its data dir (and only
index/ underneath really) and be replicated in the same fashion.
Right?
Ryan:
Yes, that was my thought. If an arbitrary directory could be distributed,
then you could have
/path/to/dist/index/...
/path/to/dist/spelling-index/...
/path/to/dist/foo
and that would all get put into a snapshot. This would also let you put
multiple cores within a single distribution:
/path/to/dist/core0/index/...
/path/to/dist/core0/spelling-index/...
/path/to/dist/core0/foo
/path/to/dist/core1/index/...
/path/to/dist/core1/spelling-index/...
/path/to/dist/core1/foo

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-797) Construct EmbeddedSolrServer response without serializing/parsing

Construct EmbeddedSolrServer response without serializing/parsing
-

 Key: SOLR-797
 URL: https://issues.apache.org/jira/browse/SOLR-797
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Jonathan Lee


Currently, the EmbeddedSolrServer serializes the response and reparses in order 
to create the final NamedList response.  From the comment in 
EmbeddedSolrServer.java, the goal is to:
* convert the response directly into a named list

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-797) Construct EmbeddedSolrServer response without serializing/parsing


 [ 
https://issues.apache.org/jira/browse/SOLR-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lee updated SOLR-797:
--

Priority: Minor  (was: Major)

 Construct EmbeddedSolrServer response without serializing/parsing
 -

 Key: SOLR-797
 URL: https://issues.apache.org/jira/browse/SOLR-797
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Jonathan Lee
Priority: Minor

 Currently, the EmbeddedSolrServer serializes the response and reparses in 
 order to create the final NamedList response.  From the comment in 
 EmbeddedSolrServer.java, the goal is to:
 * convert the response directly into a named list

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-797) Construct EmbeddedSolrServer response without serializing/parsing


 [ 
https://issues.apache.org/jira/browse/SOLR-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Lee updated SOLR-797:
--

Attachment: SOLR-797.patch

This patch contains a first stab at transforming the NamedList without 
serializing it then parsing it from the serialized form.

From what I can tell, all the fields (headers, facets, spelling, etc) returned 
from the handler in the response is valid for output except that references to 
actual documents need to be resolved.  This patch borrows code from 
NamedListCodec.java and BinaryResponseWriter.java to resolve the documents.

 Construct EmbeddedSolrServer response without serializing/parsing
 -

 Key: SOLR-797
 URL: https://issues.apache.org/jira/browse/SOLR-797
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Jonathan Lee
Priority: Minor
 Attachments: SOLR-797.patch


 Currently, the EmbeddedSolrServer serializes the response and reparses in 
 order to create the final NamedList response.  From the comment in 
 EmbeddedSolrServer.java, the goal is to:
 * convert the response directly into a named list

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: EmbeddedSolrServer API usage

2008-10-02 Thread Erik Hatcher


Thanks Ryan - good tips, and core.close() was the missing piece, duh.

Here's how it looks in JRuby:

  container = CoreContainer.new
  descriptor = CoreDescriptor.new(container, core1, /Users/erik/ 
apache-solr-1.3.0/example/solr)

  core = container.create(descriptor)
  container.register(core1, core, false)

  solr = EmbeddedSolrServer.new(container, core1)
  query = SolrQuery.new(*:*)
  response = solr.query(query)
  puts response
  core.close

Perhaps there should be an overloaded CoreContainer#register(core)  
that uses the name from the core descriptor so core1 doesn't have to  
be duplicated?


Erik


On Oct 2, 2008, at 10:37 AM, Ryan McKinley wrote:

You could also use the CoreContainer to create a Core from the  
descriptor:


   CoreContainer container = new CoreContainer();
   CoreDescriptor descriptor = new CoreDescriptor(container,
   core1, /Users/erik/apache-solr-1.3.0/example/solr);
   SolrCore core = container.create( descriptor );

if you are using a custom solrconfig name, you would need to call  
setConfigName( path ) on the descriptor.


As for closing...  have you tried core.close()?

ryan


On Oct 2, 2008, at 8:49 AM, Erik Hatcher wrote:

I'm doing some Java experiments to get ready for a solr-ruby  
overhaul such that JRuby comes into play nicely so that  
EmbeddedSolrServer can be used transparently too.  I've not tried  
this since the whole CoreContainer/CoreDescriptor stuff was added,  
and I don't quite understand it all.  Here's what I've got:


public static void main(String[] args) throws IOException,  
ParserConfigurationException, SAXException, SolrServerException {

  CoreContainer container = new CoreContainer();
  SolrConfig config = new SolrConfig(/Users/erik/apache-solr-1.3.0/ 
example/solr, solrconfig.xml, null);
  CoreDescriptor descriptor = new CoreDescriptor(container,  
core1, /Users/erik/apache-solr-1.3.0/example/solr);
  SolrCore core = new SolrCore(core1, /Users/erik/apache- 
solr-1.3.0/example/solr/data, config, null, descriptor);

  container.register(core1, core, false);
  SolrServer solr = new EmbeddedSolrServer(container, core1);
  SolrQuery query = new SolrQuery(*:*);
  QueryResponse response = solr.query(query);
  System.out.println(response =  + response);
}

This works, but has a fair bit of seemingly unnecessary  
duplication, and it also leaves the JVM stays running for some  
reason.


Is this the proper way to use EmbeddedSolrServer, or are there some  
tips to improving the code and reducing the duplication?


Also, why does the JVM keep running?  Are we spinning off a thread  
that needs to be shut down?  Is there some sort of close() call  
that is needed?


Thanks,
Erik

[jira] Created: (SOLR-798) FileListEntityProcessor can't handle directories containing lots of files

2008-10-02 Thread Grant Ingersoll (JIRA)

FileListEntityProcessor can't handle directories containing lots of files
-

 Key: SOLR-798
 URL: https://issues.apache.org/jira/browse/SOLR-798
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Reporter: Grant Ingersoll
Priority: Minor


The FileListEntityProcessor currently tries to process all documents in a 
single directory at once, and stores the results into a hashmap.  On 
directories containing a large number of documents, this quickly causes 
OutOfMemory errors.

Unfortunately, the typical fix to this is to hack FileFilter to do the work for 
you and always return false from the accept method.  It may be possible to hook 
up some type of Producer/Consumer multithreaded FileFilter approach whereby the 
FileFilter blocks until the nextRow() mechanism requests another row, thereby 
avoiding the need to cache everything in the map.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: EmbeddedSolrServer API usage



On Oct 2, 2008, at 1:58 PM, Erik Hatcher wrote:


Thanks Ryan - good tips, and core.close() was the missing piece, duh.

Here's how it looks in JRuby:

 container = CoreContainer.new
 descriptor = CoreDescriptor.new(container, core1, /Users/erik/ 
apache-solr-1.3.0/example/solr)

 core = container.create(descriptor)
 container.register(core1, core, false)

 solr = EmbeddedSolrServer.new(container, core1)
 query = SolrQuery.new(*:*)
 response = solr.query(query)
 puts response
 core.close

Perhaps there should be an overloaded CoreContainer#register(core)  
that uses the name from the core descriptor so core1 doesn't have  
to be duplicated?




+1


  public SolrCore register(SolrCore core, boolean returnPrev) {
return register( core.getName(), core, returnPrev );
  }

Re: LogoContest Process Timeline ... was: Re: [Solr Wiki] Update of LogoContest by HossMan

2008-10-02 Thread Lukáš Vlček

Hi,
I have to work on some personal matter. I won't be able to deliver this
week. Is it OK if I deliver this the next week? I think this should be still
accetable according to original 4 week schedule. (I am sorry about that...)

Lukas

On Tue, Sep 30, 2008 at 9:31 PM, Lukáš Vlček [EMAIL PROTECTED] wrote:



 On Tue, Sep 30, 2008 at 9:12 PM, Chris Hostetter [EMAIL PROTECTED]
  wrote:


 : May I have a question? What is PRC?

 Sorry: it's the Public Relations Comittee.  They don't have much of a web
 presence, so i can't include a handy URL explaining all about them, but
 they are the committee established by the ASF Board to oversee all things
 related to Apache PR (including branding and the policies for projects
 Logos [rant]which projects are expected to follow, but aren't posted
 anywhere for people to find[/rant].)

 http://www.apache.org/foundation/how-it-works.html#other


 OK, what PRC has to do with the log design? Is there any particular
 constraint/request that the logo design must follow? What is it? You
 mentioned that the logo design has to contain a word Apache, are there any
 other requirements like this?



 : (And I am sorry for not delivering other Logo proposals ... it is due to

 no problem, we're all just voluneering on this afterall -- the question is
 do you (as a graphic artist) think 4 weeks is enough time to see some
 really good, creative designs come in?

 -Hoss


 4 weeks sounds good. I will deliver more stuff by the end of this week.
 (Wow! did I say this publicly?)

 Lukas

[jira] Updated: (SOLR-55) TEST of Jira email integration

2008-10-02 Thread Hoss Man Trash Test Account (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man Trash Test Account updated SOLR-55:


Attachment: solr.png

testing image attacment -- please ignore

 TEST of Jira email integration
 --

 Key: SOLR-55
 URL: https://issues.apache.org/jira/browse/SOLR-55
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Hoss Man
Priority: Trivial
 Attachments: solr.png


 Test issue to experiement with jira email integration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-55) TEST of Jira email integration

2008-10-02 Thread Hoss Man Trash Test Account (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man Trash Test Account updated SOLR-55:


Attachment: (was: solr.png)

 TEST of Jira email integration
 --

 Key: SOLR-55
 URL: https://issues.apache.org/jira/browse/SOLR-55
 Project: Solr
  Issue Type: Task
Reporter: Hoss Man
Assignee: Hoss Man
Priority: Trivial

 Test issue to experiement with jira email integration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Fwd: LogoContest Process Timeline ... was: Re: [Solr Wiki] Update of LogoContest by HossMan


: Perhaps pushing the date to the 20th, and finishing on Thanksgiving?

Yeah, that sounds like a better idea.  i'll update the wiki.

: Using JIRA is good because it takes care of all the IP issues and is
: already there to use, but not so good because the thumbnails look
: crapy and people who submit items can not delete their own.  It also
: falls into the problem of voting on 5 version of the same logo.

I just double checked ... in spite of what the Manage Attachments page in 
Jira says, anyone can delete a file they themselves attached (i was pretty 
sure because we've had problems in the past with people removing patches 
because they have new approaches and the old ones get lost forever)

The thumbnail issue is admittedly anoying.

: Rather then voting directly off the JIRA page -- When the submissions
: are closed, I suggest we build a special page for the contest and
: only put 'final drafts' on that.   This page will be passed by the

that makes sense -- it will help eliminate the possibility of people 
adding new attachments in the middle of voting as well.  but as i said: if 
people want to withdraw a logo they can take care of that via Jira before 
the deadline.



-Hoss

Re: LogoContest Process Timeline ... was: Re: [Solr Wiki] Update of LogoContest by HossMan


: OK, what PRC has to do with the log design? Is there any particular
: constraint/request that the logo design must follow? What is it? You
: mentioned that the logo design has to contain a word Apache, are there any
: other requirements like this?

all of the guidelines and requirements they've outlined are on our wiki 
page...

http://wiki.apache.org/solr/LogoContest



-Hoss

Re: Fwd: LogoContest Process Timeline ... was: Re: [Solr Wiki] Update of LogoContest by HossMan


: Yeah, that sounds like a better idea.  i'll update the wiki.

I've made final updates to the process on the wiki  i'll do a big 
announce to solr-user, [EMAIL PROTECTED] and on the Solr home page tomorow 
unless anyone objects soon.


-Hoss

Re: What is a Standard SearchComponent? (SOLR-680)


: The reason I thought StatsComponent is default while SpellCheck is not is
: that SpellChecking necessarily requires some configuration.  Stats can be
: there without doing anything -- it is just the cost of checking if
: stats=true in the request.
: 
: I suggest that we add *all* Components that are generally useful off the shelf
: to the StandardRequestHandler.  We should add documentation to say: if you

I think that general philosophy is what makes the most sense ... as long 
as a component can be used without special configuration add it by 
default, but Components should be NOOPs unless activated by params (ie: 
facet=true).

people worried about saving every last cycle can create an explicit list 
of components and trust that no new components will get added to the 
pipeline when they upgrade. ... but for people who upgrade without 
modifying their configs, and may not even know anything about search 
components, give them the new hotness by default, so when they ask how do 
i...? and someone says try adding @foo=truefoo.this=that to your 
request the hotness starts to work for them without them needing to 
change anything.

(just like when features got added to standard/dismax before we had 
components)



-Hoss

Re: What is a Standard SearchComponent? (SOLR-680)


:  As for component registration precedence, it is the configured Component
:  that has precedence.  The Component initialization code only adds the
:  default Component if that name is not already used.  Registering your own
:  spellcheck Component will use your component.
: 
: Right, but what if someone has a stats component now in 1.3 wired into a
: custom request handler (but not /select), then upgrades to 1.4 with the new
: implicit stats built-in - then all of a sudden a request to /select will use
: _their_ stats component, not the built in one.  Right?

this never really sit well with me before either ... but i couldn't 
really place why it didn't sit well untill you gave that example.

we can't undo the current behavior because it has it's legitimate use 
cases: i've subclassed QueryComponent to modify it with some custom 
behavior, and i've registered an instance of MyQueryComponent with the 
name query and now i'm relying on it to get used by default.

I think the solution to this problem is education and documentation ... 
people customizing Solr may occasionally have to make some changes when 
upgrading, it's a fact of life that can't be completely avoided.

Consider another equally plausible situation: i have custom response 
writer in my Solr 1.2 and it uses a request param named defType -- which 
when i upgrade to Solr 1.3 suddenly causes all sorts of things to break, 
because the param name i picked collides with a new param that all the 
request handlers i use pay attention to.

In the rewuest param collision situation users are *really* screwed, 
because they have to change the params their clients send ... by 
comparison, the component name collision situation is trivial to deal 
with, just search and replace stats with mystats everywhere in your 
solrconfig.xml.


-Hoss

solr 2.0 branch/sandbox?