Re: Confusion about entities and documents

2010-10-22 Thread harrysmith

>What I get when I search for, say, "XYZ", is a document that has XYZ Corp as
a manufacturer name, but the >array of parts_manu appears to be a child of
the document, not the parts array. 
>
>Is this the correct behavior, insofar as a document has a single level of
elements, and that's it? If so, what >might be a better strategy for being
able to maintain the hierarchy of information within a document? 
>

Yes, this is the correct behavior. I still struggle with the same issue, and
there is no 'best practices' (that I have found at least) of maintaining
relationships within a Solr doc. The argument is Solr is not the correct
place for these representations and should only represent a flat version of
your document.

For a similar question see: 
http://lucene.472066.n3.nabble.com/Schema-Definition-Question-td1049966.html#a1105593

A few possible solutions are posted there, and i'm interested in how others
have tackled this issue.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Confusion-about-entities-and-documents-tp1753926p1755152.html
Sent from the Solr - User mailing list archive at Nabble.com.


DataImportHandler Error CHARBytesToJavaChars

2010-09-30 Thread harrysmith

Anyone ever see this error on an import? 

Caused by: java.lang.NullPointerException
at
oracle.jdbc.driver.DBConversion._CHARBytesToJavaChars(DBConversion.java:1015)

The Oracle column being converted is VARCHAR2(4000 Char) and there are NULLs
present in the record set.

Envrionment: Solr 1.4, Windows, Jetty 


Full stack trace below:

at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
39)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:50
2)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpCo
nnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool
.java:442)
Caused by: java.lang.NullPointerException
at
oracle.jdbc.driver.DBConversion._CHARBytesToJavaChars(DBConversion.ja
va:1015)
at
oracle.jdbc.driver.DBConversion.CHARBytesToJavaChars(DBConversion.jav
a:892)
at
oracle.jdbc.driver.T4CVarcharAccessor.unmarshalOneRow(T4CVarcharAcces
sor.java:282)
at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:919)
at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:843)
at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:630)
at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210)
at
oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:961)
at
oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStateme
nt.java:1072)
at
oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.jav
a:845)
at
oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStateme
nt.java:1154)
at
oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.ja
va:1726)
at
oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1696)

at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<
init>(JdbcDataSource.java:246)
... 32 more
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-Error-CHARBytesToJavaChars-tp1611016p1611016.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler dynamic fields clarification

2010-09-30 Thread harrysmith

>
>Two things, one are your DB column uppercase as this would effect the out.
>
>

Interesting, I was under the impression that case does not matter. 

>From http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config :
"It is possible to totally avoid the field entries in entities if the names
of the fields are same (case does not matter) as those in Solr schema"

I confirmed that matching the schema.xml field case to the database table is
needed for dynamic fields, and the wiki statement above is incorrect, or at
the very least confusing, possibly a bug.

My database is Oracle 10g and the column names have been created in all
uppercase in the database. 

In Oracle: 
Table name: wide_table
Column names: COLUMN_1 ... COLUMN_100 (yes, uppercase)

Please see following scenarios and results I found:

data-config.xml




schema.xml


Result:
Nothing Imported

=

data-config.xml




schema.xml


Result:
Note query column names changed to uppercase.
Nothing Imported

=


data-config.xml




schema.xml


Result:
Note ONLY the field entry was changed to caps

All records imported, with only COLUMN_100 id field.



data-config.xml




schema.xml


Result:
Note BOTH the field entry was changed to caps in data-config.xml, and the
dynamicField wildcard in schema.xml

All records imported, with all fields specified. This is the behavior
desired.

=









 









>
>Second what does your db-data-config.xml look like 
>
>

The relevant data-config.xml is as follows:



 



Ideally, I would rather have the query be 'select * from wide_table" with
the fields being dynamically matched by the column name from the
dynamicField wildcard from the schema.xml.

 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1609578.html
Sent from the Solr - User mailing list archive at Nabble.com.


DataImportHandler dynamic fields clarification

2010-09-29 Thread harrysmith

Looking for some clarification on DIH to make sure I am interpreting this
correctly.

I have a wide DB table, 100 columns. I'd rather not have to add 100 values
in schema.xml and data-config.xml. I was under the impression that if the
column name matched a dynamic Field name, it would be added. I am not
finding this is the case, but only works when the column name is explicitly
listed as a static field.

Example: 100 column table, columns named 'COLUMN_1, COLUMN_2 ... COLUMN_100'

If I add something like:

to schema.xml, and don't reference the column in data-config entity/field
tag, it gets imported, as expected.

However, if I use:

It does not get imported into Solr, I would expect it would.


Is this the expected behavior?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1606159.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler and SAXParseExceptions with Jetty

2010-08-13 Thread harrysmith


Shawn Heisey-4 wrote:
> 
> Because < and > are critical characters in XML, you have to encode them 
> to actually use them as part of your config, just as you do on an HTML 
> page.  Use < instead of <.  When I first ran into this, I was 
> surprised that &rt; was not required as well, but it's probably a good 
> idea to use it, just in case things tighten up in the future.
> 

Thanks, confirming this worked using '<'; instead of <. It would help
to note this in the wiki as I found it confusing the dataQuery examples used
'<'.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-and-SAXParseExceptions-with-Jetty-tp1125898p1136004.html
Sent from the Solr - User mailing list archive at Nabble.com.


DataImportHandler and SAXParseExceptions with Jetty

2010-08-12 Thread harrysmith

Win XP, Solr 1.4.1 out of the box install, using jetty. If I add greater than
or less than (ie < or >) in any xml field and attempt to load or run from
the DataImportConsole I receive a SAXParseException. Example follows:

If I don't have a 'less than' it works just fine. I know this must work,
because the examples given on the wiki show deltaQueries using a greater
than/less than compare.


Relevant snippet from data-config.xml :



Stack trace received:
org.apache.solr.common.SolrException: FATAL: Could not create importer.
DataImporter config invalid
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:121)
at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:222)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
Exception occurred while initializing context
at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:190)
at
org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:101)
at
org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:113)
... 22 more
Caused by: org.xml.sax.SAXParseException: The value of attribute "query"
associated with an element type "null" must not contain the '<' character.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown
Source)
at
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown
Source)
at
org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:178)
... 24 more

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-and-SAXParseExceptions-with-Jetty-tp1125898p1125898.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Schema Definition Question

2010-08-11 Thread harrysmith

I think I know where you're headed, I was struggling with the same issue. In
my case, using results from Solr I link to a detailed profile using an ID,
but I am displaying the String value. I was looking for something like:



12345

   Feature 1 label
   1 


   Feature 2 label
   2 




...or something similar, some way of linking child items together.
Unfortunately, this isn't how Solr works.

This issue is addressed in the Solr 1.4 book by Smiley and Pugh. This
related snippet is from Chapter 2, page 36, dealing with an example
application with a Music artist's name, and a related id.

"...If we only record the name, then it is problematic to do things like
have links in the UI from a band member to that member's detail page... This
means that we'll need to have an additional multi-valued field for the
member's ID. Multi-valued fields maintain ordering so that the two fields
would have corresponding values at a given index. Beware, there can be a
tricky case when one of the values can be blank, and you need to come up
with a placeholder. The client code would have to know about this
placeholder."

So it seems that we will be assured that the multivalued fields will be in
the same order, so we can use the same index number. This seems clunky to
me, but I have not come across any other solutions.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Schema-Definition-Question-tp1049966p1105593.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH transformer script size limitations with Jetty?

2010-08-11 Thread harrysmith

To follow up on my own question, it appears this is only an issue when using
the DataImport console debugging tools. It looks like when submitting the
debugging request, the data-config.xml is sent via a GET request, which
would fail.  However, using the exact same data-config.xml via a full-import
operation (ie not a dry run debug), it looks like the request is sent POST
and the import works fine.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-transformer-script-size-limitations-with-Jetty-tp1091246p1100285.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to compile nightly build?

2010-08-10 Thread harrysmith

In this particular case I would like to get the trunk. Is there a different
link for binary distributions of nightly builds?

I had been downloading from here: 
http://hudson.zones.apache.org/hudson/job/Solr-trunk/lastSuccessfulBuild/artifact/trunk/solr/dist/

In the case I did want to compile from the source, am I missing a step?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-compile-nightly-build-tp1077115p1080266.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to compile nightly build?

2010-08-10 Thread harrysmith

I am attempting to follow the instructions located at:

http://wiki.apache.org/solr/ExtractingRequestHandler#Getting_Started_with_the_Solr_Example

I have downloaded the most recent clean build from Hudson.

After running 'ant example' I get the following error:


C:\solr_build\apache-solr-4.0-2010-07-27_08-06-29>ant example
Buildfile: C:\solr_build\apache-solr-4.0-2010-07-27_08-06-29\build.xml

init-forrest-entities:

compile-lucene:

BUILD FAILED
C:\solr_build\apache-solr-4.0-2010-07-27_08-06-29\common-build.xml:214:
C:\solr_
build\modules\analysis\common does not exist.

Total time: 0 seconds
=

What is the correct procedure?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-compile-nightly-build-tp1077115p1077115.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH and multivariable fields problems

2010-08-09 Thread harrysmith

This is increasingly more looking like a bug. To recap, I am trying to use
the DIH to import multivalued dynamic fields and using a variable to name
that field.

Upon further testing, the multivalued import works fine with a
static/constant name, but only keeps the first record when naming the field
dynamically. See below for relevant snips.

>From schema.xml :


>From data-config.xml :








Produces the following, note that there are 3 records that should be
returned and are correctly done, with the field name being a constant.

- 
- 
  9892962 
- 
  record 1 
  record 2 
  record 3 
  Polygraph Newsletter Title 
  
- 
  Polygraph Newsletter Title 
  
  
  

===

Now, changing the field name to a variable..., note only the first record is
retained for the 'Relation_s' field -- there should be 3 records.

 
becomes
 

produces the following:
- 
- 
- 
  record 1 
  
- 
  Polygraph Newsletter Title 
  
  9892962 
- 
  Polygraph Newsletter Title 
  
  
  

Only the first record is retained. There was also another post (which
recieved no replies) in the archive that reported the same issue. The DIH
debug logs do show 3 records correctly being returned, so somehow these are
not getting added.



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-and-multivariable-fields-problems-tp1032893p1065244.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH and multivariable fields problems

2010-08-06 Thread harrysmith

That's fine, i'm ok with the sub-queries and the additional overhead is to be
expected and how I would have thought it would work.

I guess my question is does it work that way at all or am I misinterpreting
something? Have others successfully imported dynamic multivalued fields in a
child entity using the DataImportHandler via the child entity returning
multiple records through a RDBMS?

Someone else had a similar issue recently but no one replied. See:
http://lucene.472066.n3.nabble.com/dataimporthandler-multivalued-dynamic-fields-td684509.html#a684509


Amit Nithian wrote:
> 
>>That's probably the most efficient way to do it... I believe the line you
>>are referring allows you to have sub-entities which , in the RDBMS, would
>>execute a separate query for each parent given a primary key. The downside
>>to this though is that for each parent you will be executing N separate
>>queries.
> 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-and-multivariable-fields-problems-tp1032893p1033222.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DIH and multivariable fields problems

2010-08-06 Thread harrysmith

Thanks, this helps a great deal, and I may be able to use this method.

Is this how DIH is intended to be used? The multi values should be returned
in 1 row then manipulated by a transformer? This is fine, but is just
unclear from the documentation. I was under the assumption that multiple
rows returned for a child entity with the same parent would be able to
create a multivalued entry.

>From the DataImportHandler wiki:

"...it is possible to create a multivalued field by joining an entity with
another.i.e if the sub-entity returns multiple rows for one row from parent
entity it can go into a multivalued field"





  For multiple value fields using the DIH, i use group_concat with the 
regextransformer's splitby:
ex:




hope that's helpful.

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests:
http://gradschoolnow.com


On 8/6/10 4:39 PM, harrysmith wrote:
> I'm having a difficult time understanding how multivariable fields work
> with
> the DataImportHandler when the source is a RDBMS. I've read the following
> from the wiki:
>

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-and-multivariable-fields-problems-tp1032893p1033045.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH and multivariable fields problems

2010-08-06 Thread harrysmith

I'm having a difficult time understanding how multivariable fields work with
the DataImportHandler when the source is a RDBMS. I've read the following
from the wiki:

--
What is a row?

A row in DataImportHandler is a Map (Map). In the map , the
key is the name of the field and the value can be anything which is a valid
Solr type. The value can also be a Collection of the valid Solr types (this
may get mapped to a multi-valued field). If the DataSource is RDBMS a query
cannot emit a multivalued field. But it is possible to create a multivalued
field by joining an entity with another.i.e if the sub-entity returns
multiple rows for one row from parent entity it can go into a multivalued
field. If the datasource is xml, it is possible to return a multivalued
field. 
--

How does one 'join an entity with another'?  Below are the relevant sections
of my schema.xml and data-config.xml.

schema.xml



=

data-config.xml



 
  
 
   
  




I have multiple terms (rows) in the term_metadata table that are returned
from the query, but only the first one gets added. Am I missing something
obvious?


























-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-and-multivariable-fields-problems-tp1032893p1032893.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Some basic DataImportHandler questions

2010-08-04 Thread harrysmith

Thanks, I think part of my issue may be I am misunderstanding how to use the
entity and field tags to import data in a particular format and am looking
for a few more examples.

Lets say I have a database table with 2 columns that contain metadata fields
and values, and would like to import this into Solr and keep the pairs
together, an example database table follows consisting of two columns
(String), one containing metadata names and the other metadata values (col
names: metadata_name, metadata_value in this example). There may be multiple
records for a name. The set of potential metadata_names is unknown, it could
be anything.

metadata_name . metadata_value
=====
title   blah blah
subject  some subject
subject  another subject
name some name


What is the proper way to import these and keep the name/value pairs intact.
I am seeing the following after import:


title
subject
name

−

blah blah
some subject
another subject
some name


Ideally, the end goal would be something like below:


some subject



some name


etc

It feels like I am missing something obvious and this would be a common
structure for imports.





>> Just starting with DataImportHandler and had a few simple questions.
>>
>> Is there a location for more in depth documentation other than
>> http://wiki.apache.org/solr/DataImportHandler?
>>
>>

>Umm, no, but let us know what is not covered well and it can be added. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Some-basic-DataImportHandler-questions-tp1010291p1024205.html
Sent from the Solr - User mailing list archive at Nabble.com.