Re: Confusion about entities and documents
>What I get when I search for, say, "XYZ", is a document that has XYZ Corp as a manufacturer name, but the >array of parts_manu appears to be a child of the document, not the parts array. > >Is this the correct behavior, insofar as a document has a single level of elements, and that's it? If so, what >might be a better strategy for being able to maintain the hierarchy of information within a document? > Yes, this is the correct behavior. I still struggle with the same issue, and there is no 'best practices' (that I have found at least) of maintaining relationships within a Solr doc. The argument is Solr is not the correct place for these representations and should only represent a flat version of your document. For a similar question see: http://lucene.472066.n3.nabble.com/Schema-Definition-Question-td1049966.html#a1105593 A few possible solutions are posted there, and i'm interested in how others have tackled this issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Confusion-about-entities-and-documents-tp1753926p1755152.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImportHandler Error CHARBytesToJavaChars
Anyone ever see this error on an import? Caused by: java.lang.NullPointerException at oracle.jdbc.driver.DBConversion._CHARBytesToJavaChars(DBConversion.java:1015) The Oracle column being converted is VARCHAR2(4000 Char) and there are NULLs present in the record set. Envrionment: Solr 1.4, Windows, Jetty Full stack trace below: at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand lerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection. java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 39) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:50 2) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpCo nnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector. java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool .java:442) Caused by: java.lang.NullPointerException at oracle.jdbc.driver.DBConversion._CHARBytesToJavaChars(DBConversion.ja va:1015) at oracle.jdbc.driver.DBConversion.CHARBytesToJavaChars(DBConversion.jav a:892) at oracle.jdbc.driver.T4CVarcharAccessor.unmarshalOneRow(T4CVarcharAcces sor.java:282) at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:919) at oracle.jdbc.driver.T4CTTIrxd.unmarshal(T4CTTIrxd.java:843) at oracle.jdbc.driver.T4C8Oall.receive(T4C8Oall.java:630) at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:210) at oracle.jdbc.driver.T4CStatement.executeForRows(T4CStatement.java:961) at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStateme nt.java:1072) at oracle.jdbc.driver.T4CStatement.executeMaybeDescribe(T4CStatement.jav a:845) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStateme nt.java:1154) at oracle.jdbc.driver.OracleStatement.executeInternal(OracleStatement.ja va:1726) at oracle.jdbc.driver.OracleStatement.execute(OracleStatement.java:1696) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.< init>(JdbcDataSource.java:246) ... 32 more -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-Error-CHARBytesToJavaChars-tp1611016p1611016.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler dynamic fields clarification
> >Two things, one are your DB column uppercase as this would effect the out. > > Interesting, I was under the impression that case does not matter. >From http://wiki.apache.org/solr/DataImportHandler#A_shorter_data-config : "It is possible to totally avoid the field entries in entities if the names of the fields are same (case does not matter) as those in Solr schema" I confirmed that matching the schema.xml field case to the database table is needed for dynamic fields, and the wiki statement above is incorrect, or at the very least confusing, possibly a bug. My database is Oracle 10g and the column names have been created in all uppercase in the database. In Oracle: Table name: wide_table Column names: COLUMN_1 ... COLUMN_100 (yes, uppercase) Please see following scenarios and results I found: data-config.xml schema.xml Result: Nothing Imported = data-config.xml schema.xml Result: Note query column names changed to uppercase. Nothing Imported = data-config.xml schema.xml Result: Note ONLY the field entry was changed to caps All records imported, with only COLUMN_100 id field. data-config.xml schema.xml Result: Note BOTH the field entry was changed to caps in data-config.xml, and the dynamicField wildcard in schema.xml All records imported, with all fields specified. This is the behavior desired. = > >Second what does your db-data-config.xml look like > > The relevant data-config.xml is as follows: Ideally, I would rather have the query be 'select * from wide_table" with the fields being dynamically matched by the column name from the dynamicField wildcard from the schema.xml. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1609578.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImportHandler dynamic fields clarification
Looking for some clarification on DIH to make sure I am interpreting this correctly. I have a wide DB table, 100 columns. I'd rather not have to add 100 values in schema.xml and data-config.xml. I was under the impression that if the column name matched a dynamic Field name, it would be added. I am not finding this is the case, but only works when the column name is explicitly listed as a static field. Example: 100 column table, columns named 'COLUMN_1, COLUMN_2 ... COLUMN_100' If I add something like: to schema.xml, and don't reference the column in data-config entity/field tag, it gets imported, as expected. However, if I use: It does not get imported into Solr, I would expect it would. Is this the expected behavior? -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-dynamic-fields-clarification-tp1606159p1606159.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler and SAXParseExceptions with Jetty
Shawn Heisey-4 wrote: > > Because < and > are critical characters in XML, you have to encode them > to actually use them as part of your config, just as you do on an HTML > page. Use < instead of <. When I first ran into this, I was > surprised that &rt; was not required as well, but it's probably a good > idea to use it, just in case things tighten up in the future. > Thanks, confirming this worked using '<'; instead of <. It would help to note this in the wiki as I found it confusing the dataQuery examples used '<'. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-and-SAXParseExceptions-with-Jetty-tp1125898p1136004.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImportHandler and SAXParseExceptions with Jetty
Win XP, Solr 1.4.1 out of the box install, using jetty. If I add greater than or less than (ie < or >) in any xml field and attempt to load or run from the DataImportConsole I receive a SAXParseException. Example follows: If I don't have a 'less than' it works just fine. I know this must work, because the examples given on the wiki show deltaQueries using a greater than/less than compare. Relevant snippet from data-config.xml : Stack trace received: org.apache.solr.common.SolrException: FATAL: Could not create importer. DataImporter config invalid at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:121) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:222) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Exception occurred while initializing context at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:190) at org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:101) at org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:113) ... 22 more Caused by: org.xml.sax.SAXParseException: The value of attribute "query" associated with an element type "null" must not contain the '<' character. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source) at org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:178) ... 24 more -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-and-SAXParseExceptions-with-Jetty-tp1125898p1125898.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Schema Definition Question
I think I know where you're headed, I was struggling with the same issue. In my case, using results from Solr I link to a detailed profile using an ID, but I am displaying the String value. I was looking for something like: 12345 Feature 1 label 1 Feature 2 label 2 ...or something similar, some way of linking child items together. Unfortunately, this isn't how Solr works. This issue is addressed in the Solr 1.4 book by Smiley and Pugh. This related snippet is from Chapter 2, page 36, dealing with an example application with a Music artist's name, and a related id. "...If we only record the name, then it is problematic to do things like have links in the UI from a band member to that member's detail page... This means that we'll need to have an additional multi-valued field for the member's ID. Multi-valued fields maintain ordering so that the two fields would have corresponding values at a given index. Beware, there can be a tricky case when one of the values can be blank, and you need to come up with a placeholder. The client code would have to know about this placeholder." So it seems that we will be assured that the multivalued fields will be in the same order, so we can use the same index number. This seems clunky to me, but I have not come across any other solutions. -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-Definition-Question-tp1049966p1105593.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH transformer script size limitations with Jetty?
To follow up on my own question, it appears this is only an issue when using the DataImport console debugging tools. It looks like when submitting the debugging request, the data-config.xml is sent via a GET request, which would fail. However, using the exact same data-config.xml via a full-import operation (ie not a dry run debug), it looks like the request is sent POST and the import works fine. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-transformer-script-size-limitations-with-Jetty-tp1091246p1100285.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to compile nightly build?
In this particular case I would like to get the trunk. Is there a different link for binary distributions of nightly builds? I had been downloading from here: http://hudson.zones.apache.org/hudson/job/Solr-trunk/lastSuccessfulBuild/artifact/trunk/solr/dist/ In the case I did want to compile from the source, am I missing a step? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-compile-nightly-build-tp1077115p1080266.html Sent from the Solr - User mailing list archive at Nabble.com.
How to compile nightly build?
I am attempting to follow the instructions located at: http://wiki.apache.org/solr/ExtractingRequestHandler#Getting_Started_with_the_Solr_Example I have downloaded the most recent clean build from Hudson. After running 'ant example' I get the following error: C:\solr_build\apache-solr-4.0-2010-07-27_08-06-29>ant example Buildfile: C:\solr_build\apache-solr-4.0-2010-07-27_08-06-29\build.xml init-forrest-entities: compile-lucene: BUILD FAILED C:\solr_build\apache-solr-4.0-2010-07-27_08-06-29\common-build.xml:214: C:\solr_ build\modules\analysis\common does not exist. Total time: 0 seconds = What is the correct procedure? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-compile-nightly-build-tp1077115p1077115.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH and multivariable fields problems
This is increasingly more looking like a bug. To recap, I am trying to use the DIH to import multivalued dynamic fields and using a variable to name that field. Upon further testing, the multivalued import works fine with a static/constant name, but only keeps the first record when naming the field dynamically. See below for relevant snips. >From schema.xml : >From data-config.xml : Produces the following, note that there are 3 records that should be returned and are correctly done, with the field name being a constant. - - 9892962 - record 1 record 2 record 3 Polygraph Newsletter Title - Polygraph Newsletter Title === Now, changing the field name to a variable..., note only the first record is retained for the 'Relation_s' field -- there should be 3 records. becomes produces the following: - - - record 1 - Polygraph Newsletter Title 9892962 - Polygraph Newsletter Title Only the first record is retained. There was also another post (which recieved no replies) in the archive that reported the same issue. The DIH debug logs do show 3 records correctly being returned, so somehow these are not getting added. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-multivariable-fields-problems-tp1032893p1065244.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH and multivariable fields problems
That's fine, i'm ok with the sub-queries and the additional overhead is to be expected and how I would have thought it would work. I guess my question is does it work that way at all or am I misinterpreting something? Have others successfully imported dynamic multivalued fields in a child entity using the DataImportHandler via the child entity returning multiple records through a RDBMS? Someone else had a similar issue recently but no one replied. See: http://lucene.472066.n3.nabble.com/dataimporthandler-multivalued-dynamic-fields-td684509.html#a684509 Amit Nithian wrote: > >>That's probably the most efficient way to do it... I believe the line you >>are referring allows you to have sub-entities which , in the RDBMS, would >>execute a separate query for each parent given a primary key. The downside >>to this though is that for each parent you will be executing N separate >>queries. > -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-multivariable-fields-problems-tp1032893p1033222.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH and multivariable fields problems
Thanks, this helps a great deal, and I may be able to use this method. Is this how DIH is intended to be used? The multi values should be returned in 1 row then manipulated by a transformer? This is fine, but is just unclear from the documentation. I was under the assumption that multiple rows returned for a child entity with the same parent would be able to create a multivalued entry. >From the DataImportHandler wiki: "...it is possible to create a multivalued field by joining an entity with another.i.e if the sub-entity returns multiple rows for one row from parent entity it can go into a multivalued field" For multiple value fields using the DIH, i use group_concat with the regextransformer's splitby: ex: hope that's helpful. @tommychheng Programmer and UC Irvine Graduate Student Find a great grad school based on research interests: http://gradschoolnow.com On 8/6/10 4:39 PM, harrysmith wrote: > I'm having a difficult time understanding how multivariable fields work > with > the DataImportHandler when the source is a RDBMS. I've read the following > from the wiki: > -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-multivariable-fields-problems-tp1032893p1033045.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH and multivariable fields problems
I'm having a difficult time understanding how multivariable fields work with the DataImportHandler when the source is a RDBMS. I've read the following from the wiki: -- What is a row? A row in DataImportHandler is a Map (Map). In the map , the key is the name of the field and the value can be anything which is a valid Solr type. The value can also be a Collection of the valid Solr types (this may get mapped to a multi-valued field). If the DataSource is RDBMS a query cannot emit a multivalued field. But it is possible to create a multivalued field by joining an entity with another.i.e if the sub-entity returns multiple rows for one row from parent entity it can go into a multivalued field. If the datasource is xml, it is possible to return a multivalued field. -- How does one 'join an entity with another'? Below are the relevant sections of my schema.xml and data-config.xml. schema.xml = data-config.xml I have multiple terms (rows) in the term_metadata table that are returned from the query, but only the first one gets added. Am I missing something obvious? -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-and-multivariable-fields-problems-tp1032893p1032893.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Some basic DataImportHandler questions
Thanks, I think part of my issue may be I am misunderstanding how to use the entity and field tags to import data in a particular format and am looking for a few more examples. Lets say I have a database table with 2 columns that contain metadata fields and values, and would like to import this into Solr and keep the pairs together, an example database table follows consisting of two columns (String), one containing metadata names and the other metadata values (col names: metadata_name, metadata_value in this example). There may be multiple records for a name. The set of potential metadata_names is unknown, it could be anything. metadata_name . metadata_value ===== title blah blah subject some subject subject another subject name some name What is the proper way to import these and keep the name/value pairs intact. I am seeing the following after import: title subject name − blah blah some subject another subject some name Ideally, the end goal would be something like below: some subject some name etc It feels like I am missing something obvious and this would be a common structure for imports. >> Just starting with DataImportHandler and had a few simple questions. >> >> Is there a location for more in depth documentation other than >> http://wiki.apache.org/solr/DataImportHandler? >> >> >Umm, no, but let us know what is not covered well and it can be added. -- View this message in context: http://lucene.472066.n3.nabble.com/Some-basic-DataImportHandler-questions-tp1010291p1024205.html Sent from the Solr - User mailing list archive at Nabble.com.