[jira] Commented: (SOLR-278) LukeRequest/Response for handling show=schema

2007-06-29 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509156
 ] 

Ryan McKinley commented on SOLR-278:


looks good.  Do you have suggestions on how to modify SOLR-266?  The schema 
info is different enough (fields etc) that nothing poped out at me...

 LukeRequest/Response for handling show=schema
 -

 Key: SOLR-278
 URL: https://issues.apache.org/jira/browse/SOLR-278
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3

 Attachments: LukeSchemaHandling.patch


 the soon to be attached patch adds a method to LukeRequest to set the option 
 for showing schema from SOLR-266.  the patch also modifies LukeRepsonse to 
 handle the schema info in the same manner as the fields from the 'normal' 
 luke response.  i think it's worth talking about unifying the response format 
 so that they aren't different but that's a larger discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-278) LukeRequest/Response for handling show=schema

2007-06-29 Thread Will Johnson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Johnson updated SOLR-278:
--

Attachment: LukeSchemaHandling.patch

 LukeRequest/Response for handling show=schema
 -

 Key: SOLR-278
 URL: https://issues.apache.org/jira/browse/SOLR-278
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3

 Attachments: LukeSchemaHandling.patch


 the soon to be attached patch adds a method to LukeRequest to set the option 
 for showing schema from SOLR-266.  the patch also modifies LukeRepsonse to 
 handle the schema info in the same manner as the fields from the 'normal' 
 luke response.  i think it's worth talking about unifying the response format 
 so that they aren't different but that's a larger discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-278) LukeRequest/Response for handling show=schema

2007-06-29 Thread Will Johnson (JIRA)
LukeRequest/Response for handling show=schema
-

 Key: SOLR-278
 URL: https://issues.apache.org/jira/browse/SOLR-278
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3


the soon to be attached patch adds a method to LukeRequest to set the option 
for showing schema from SOLR-266.  the patch also modifies LukeRepsonse to 
handle the schema info in the same manner as the fields from the 'normal' luke 
response.  i think it's worth talking about unifying the response format so 
that they aren't different but that's a larger discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-278) LukeRequest/Response for handling show=schema

2007-06-29 Thread Will Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509160
 ] 

Will Johnson commented on SOLR-278:
---

I guess I was hoping for a super set of features in
LukeResponse.FieldInfo which will be partially set by the schema and
partially set by the luke-ish info.  We could be even merge the two if
it made sense.

In the end I need to get a list of fields that solr currently knows
about which seems to be a grouping of both the schema and the index via
dynamic fields.  The current patch does this but I think there is a
better approach somewhere out there.

- will



 LukeRequest/Response for handling show=schema
 -

 Key: SOLR-278
 URL: https://issues.apache.org/jira/browse/SOLR-278
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3

 Attachments: LukeSchemaHandling.patch


 the soon to be attached patch adds a method to LukeRequest to set the option 
 for showing schema from SOLR-266.  the patch also modifies LukeRepsonse to 
 handle the schema info in the same manner as the fields from the 'normal' 
 luke response.  i think it's worth talking about unifying the response format 
 so that they aren't different but that's a larger discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-278) LukeRequest/Response for handling show=schema

2007-06-29 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509164
 ] 

Ryan McKinley commented on SOLR-278:


yes, there must a better solution to merge schema vs index field info.  I'm 
open to any suggestions.

I added your changes in rev 551971


 LukeRequest/Response for handling show=schema
 -

 Key: SOLR-278
 URL: https://issues.apache.org/jira/browse/SOLR-278
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Will Johnson
Priority: Minor
 Fix For: 1.3

 Attachments: LukeSchemaHandling.patch


 the soon to be attached patch adds a method to LukeRequest to set the option 
 for showing schema from SOLR-266.  the patch also modifies LukeRepsonse to 
 handle the schema info in the same manner as the fields from the 'normal' 
 luke response.  i think it's worth talking about unifying the response format 
 so that they aren't different but that's a larger discussion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-279) System Properties for Testing are now in Java code AND Ant build.xml

2007-06-29 Thread Eric Pugh (JIRA)
System Properties for Testing are now in Java code AND Ant build.xml


 Key: SOLR-279
 URL: https://issues.apache.org/jira/browse/SOLR-279
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
Reporter: Eric Pugh
Priority: Minor
 Fix For: 1.3


The system properties can now be pulled out of build.xml due to commit revision 
551701

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-279) System Properties for Testing are now in Java code AND Ant build.xml

2007-06-29 Thread Eric Pugh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Pugh updated SOLR-279:
---

Attachment: syspropties.patch

Patch file for build.xml for removing system properties

 System Properties for Testing are now in Java code AND Ant build.xml
 

 Key: SOLR-279
 URL: https://issues.apache.org/jira/browse/SOLR-279
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
Reporter: Eric Pugh
Priority: Minor
 Fix For: 1.3

 Attachments: syspropties.patch


 The system properties can now be pulled out of build.xml due to commit 
 revision 551701

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Thanks for commit 551701!

2007-06-29 Thread Eric Pugh

Yonik,

Thanks for commit 551701, I have created bug https:// 
issues.apache.org/jira/browse/SOLR-279 for removing the properties  
from build.xml as well.


Cheers,

Eric

---
Principal
OpenSource Connections
Site: http://www.opensourceconnections.com
Blog: http://blog.opensourceconnections.com
Cell: 1-434-466-1467






[jira] Resolved: (SOLR-279) System Properties for Testing are now in Java code AND Ant build.xml

2007-06-29 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-279.
---

Resolution: Fixed

committed.

 System Properties for Testing are now in Java code AND Ant build.xml
 

 Key: SOLR-279
 URL: https://issues.apache.org/jira/browse/SOLR-279
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.3
Reporter: Eric Pugh
Priority: Minor
 Fix For: 1.3

 Attachments: syspropties.patch


 The system properties can now be pulled out of build.xml due to commit 
 revision 551701

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



stax vs xpp XmlUpdateHandler

2007-06-29 Thread Ryan McKinley
I just did some performance testing to compare the stax vs xpp 
implementaion.  As far as I can tell there is no real difference between 
them.


Using solrj, this adds 1 documents for each handler - running each 
as an independent call.


STAX: 8631 8221 8525 8383 8487 = 42247
XPP:  8309 8438 8261 8794 8237 = 42039


How do you all feel about moving:
 XmlUpdateRequestHandler - XppUpdateRequestHandler
 StaxUpdateRequestHandler - XmlUpdateRequestHandler

then deprecating XppUpdateRequestHandler?  This will urge people to use 
the Stax implemenation sooner then later and should help iron out any 
problems sooner then later.


thoughts?


Here is the actual test code:

  public long makeRequests( String path, int cnt ) throws Exception
  {
server.deleteByQuery( *:* );
server.optimize();

long now = System.currentTimeMillis();
UpdateRequest req = new UpdateRequest();
req.setPath( path );
for( int i=0; icnt; i++ ) {
  SolrInputDocument doc = new SolrInputDocument();
  doc.addField( id, i+ );
  doc.addField( name, hello );
  for( int x=5; x5; x++ ) {
doc.addField( feature, feature:+x );
  }
  req.add( doc );
  server.request( req );
  req.clear();
}
server.commit();
long elapsed = System.currentTimeMillis() - now;

QueryResponse response = server.query( new SolrQuery( *:* ) );
if( cnt != response.getResults().getNumFound() ) {
  throw new Exception( did not add everything! );
}
return elapsed;
  }


Re: stax vs xpp XmlUpdateHandler

2007-06-29 Thread Chris Hostetter

: How do you all feel about moving:
:   XmlUpdateRequestHandler - XppUpdateRequestHandler
:   StaxUpdateRequestHandler - XmlUpdateRequestHandler
:
: then deprecating XppUpdateRequestHandler?  This will urge people to use
: the Stax implemenation sooner then later and should help iron out any
: problems sooner then later.

I'm kinda out of the looop on the whole Stax/Xpp/Xml update parsing stuff
... am i remembering correctly the end game goal is to reduce/eliminate
dependencies on XPP?  (because   ?  stax is Java standard
included out-of-the-box with java6? (i'm guessing))


: Here is the actual test code:

those are some fairly small documents ... we should probably test out some
bigger inputs.

A lot of people seem to be sending multiple documents at a time as well,
so we should test that use case (ie: add containing 1 small
documents; add containg 100 medium documents; add containing 1 big
document)

for teh purpose of perf teesting the XML parsing it might also make sense
to use a schema where every field is ignored (ie: no analysys, no stored
values) to help isolate the parsing costs.


-Hoss



Re: stax vs xpp XmlUpdateHandler

2007-06-29 Thread Ryan McKinley


I'm kinda out of the looop on the whole Stax/Xpp/Xml update parsing stuff
... am i remembering correctly the end game goal is to reduce/eliminate
dependencies on XPP?  (because   ?  stax is Java standard
included out-of-the-box with java6? (i'm guessing))



For me the biggest reason is to de-couple the parsing from the actual 
update processing.  I need to do custom processing in between 
(SOLR-269).  Stax is a growing standard, so it seems like the right 
choice if we are reworking document parsing.  (depending on your 
preference) It is a bit easier to work with and more readable.


With the parsing separated from indexing, it would be straightforward to 
have a single UpdateRequestHandler that could read the content type and 
pick how to parse the documents - using the same indexing 
strategies/format/processor etc.




A lot of people seem to be sending multiple documents at a time as well,
so we should test that use case (ie: add containing 1 small
documents; add containg 100 medium documents; add containing 1 big
document)



that makes sense.  I don't claim the tests I ran are representative - i 
just wanted to make sure the overall speeds are within the same ballpark.


this one sends 1 docs together (with 10 text fields), then 1 
docs individually each with 100 text fields.  Still not the most 
scientific, but here it is:


STAX: 57642
XPP: 58012


  @Override public void setUp() throws Exception
  {
super.setUp();

// setup the server...
server = new EmbeddedSolrServer( SolrCore.getSolrCore() );
  }

  public SolrInputDocument createDocument( int id, int fcnt )
  {
SolrInputDocument doc = new SolrInputDocument();
doc.addField( id, id+ );
doc.addField( name, hello );
for( int x=5; xfcnt; x++ ) {
  doc.addField( text, this is just some  text with  asgasdg; 
+x );

}
return doc;
  }

  public long makeRequests( String path, int cnt ) throws Exception
  {
server.deleteByQuery( *:* );// delete everything!
server.optimize();

long now = System.currentTimeMillis();
UpdateRequest req = new UpdateRequest();
req.setPath( path );

// Send all the docs together
for( int i=0; icnt; i++ ) {
  req.add( createDocument( i , 10 ) );
}
server.request( req );
req.clear();

// Send them one at a time
for( int i=0; icnt; i++ ) {
  req.add( createDocument( i+cnt, 100 ) );
  server.request( req );
  req.clear();
}
server.commit();
long elapsed = System.currentTimeMillis() - now;

QueryResponse response = server.query( new SolrQuery( *:* ) );
if( (cnt*2) != response.getResults().getNumFound() ) {
  throw new Exception( did not add everything! );
}
return elapsed;
  }


  /**
   * query the example
   */
  public void testExampleConfig() throws Exception
  {
// Empty the database...
long time = makeRequests( /update, 1 );
System.out.println( time:  + time);
  }




Re: stax vs xpp XmlUpdateHandler

2007-06-29 Thread Yonik Seeley

On 6/29/07, Ryan McKinley [EMAIL PROTECTED] wrote:

How do you all feel about moving:
  XmlUpdateRequestHandler - XppUpdateRequestHandler
  StaxUpdateRequestHandler - XmlUpdateRequestHandler

then deprecating XppUpdateRequestHandler?


+1

I think we could remove the XppUpdateRequestHandler relatively quickly
to get rid of the XPP dependency.  It's more of an implementation
detail and shouldn't be visible to most users.

-Yonik


Re: stax vs xpp XmlUpdateHandler

2007-06-29 Thread Ryan McKinley



so we should test that use case (ie: add containing 1 small
documents; 


For processing a single request with 1 documents, the existing XPP 
update handler is faster then the new StaxUpdateHandler.


XPP: 6888 6714
STAX: 8665 8313

I looked into it, and the difference seems to be entirely in the logging 
strategy.  the XPP handler prints out a single line for all 10K docs:


INFO: added id={0,1,2,3,4,5,6,7,8,9,10,11,12,13,1...

the STAX one sends each document to the processor that logs the add 
individually:


INFO: added id={0} in 28ms  [29]
INFO: added id={1} in 3ms  [33]
INFO: added id={2} in 1ms  [35]
...

If I remove logging, the same test runs in:

STAX: 6783 6834

essentially equivalent to the XPP version

ryan


Re: stax vs xpp XmlUpdateHandler

2007-06-29 Thread Yonik Seeley

On 6/29/07, Ryan McKinley [EMAIL PROTECTED] wrote:

If I remove logging, the same test runs in:

STAX: 6783 6834

essentially equivalent to the XPP version


What about if you remove the logging for the XPP version too?

-Yonik


Re: stax vs xpp XmlUpdateHandler

2007-06-29 Thread Yonik Seeley

On 6/29/07, Ryan McKinley [EMAIL PROTECTED] wrote:

 so we should test that use case (ie: add containing 1 small
 documents;

For processing a single request with 1 documents, the existing XPP
update handler is faster then the new StaxUpdateHandler.

XPP: 6888 6714
STAX: 8665 8313


Have you tried Woodstox to see how it compares?

If you do more testing, in addition to Hoss' recommendations, I'd also
remove the unused elements (copyFields, dynamicFields) from the schema
(as well as testing w/o logging).

-Yonik