Re: no error delta fail with DataImportHandler
the deltaQuery select 'product_id' and your deltaImportQuery uses ${dataimporter.delta.id} I guess it should have been ${dataimporter.delta. product_id} On Wed, Dec 2, 2009 at 11:52 PM, Thomas Woodard gtfo...@hotmail.com wrote: I'm trying to get delta indexing set up. My configuration allows a full index no problem, but when I create a test delta of a single record, the delta import finds the record but then does nothing. I can only assume I have something subtly wrong with my configuration, but according to the wiki, my configuration should be valid. What I am trying to do is have a single delta detected on the top level entity trigger a rebuild of everything under that entity, the same as the first example in the wiki. Any help would be greatly appreciated. dataConfig dataSource name=prodcat driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:oci:@XXX user=XXX password=XXX autoCommit=false transactionIsolation=TRANSACTION_READ_COMMITTED/ document entity name=product dataSource=prodcat query= select dp.product_id, dp.display_name, dp.long_description, gp.orientation from dcs_product dp, gl_product gp where dp.product_id = gp.product_id transformer=ClobTransformer,HTMLStripTransformer deltaImportQuery=select dp.product_id, dp.display_name, dp.long_description, gp.orientation from dcs_product dp, gl_product gp where dp.product_id = gp.product_id AND dp.product_id = '${dataimporter.delta.id}' deltaQuery=select product_id from gl_product_modified where last_modified TO_DATE('${dataimporter.last_index_time}', '-mm-dd hh:mi:ss') rootEntity=false pk=PRODUCT_ID !-- COLUMN NAMES ARE CASE SENSITIVE. THEY NEED TO BE ALL CAPS OR EVERYTHING FAILS -- field column=PRODUCT_ID name=product_id/ field column=DISPLAY_NAME name=name/ field column=LONG_DESCRIPTION name=long_description clob=true stripHTML=true / field column=ORIENTATION name=orientation/ entity name=sku dataSource=prodcat query=select ds.sku_id, ds.sku_type, ds.on_sale, '${product.PRODUCT_ID}' || '_' || ds.sku_id as unique_id from dcs_prd_chldsku dpc, dcs_sku ds where dpc.product_id = '${product.PRODUCT_ID}' and dpc.sku_id = ds.sku_id rootEntity=true pk=PRODUCT_ID, SKU_ID field column=SKU_ID name=sku_id/ field column=SKU_TYPE name=sku_type/ field column=ON_SALE name=on_sale/ field column=UNIQUE_ID name=unique_id/ entity name=catalog dataSource=prodcat query=select pc.catalog_id from gl_prd_catalog pc, gl_sku_catalog sc where pc.product_id = '${product.PRODUCT_ID}' and sc.sku_id = '${sku.SKU_ID}' and pc.catalog_id = sc.catalog_id pk=SKU_ID, CATALOG_ID field column=CATALOG_ID name=catalogs/ /entity entity name=price dataSource=prodcat query=select ds.list_price as price from dcs_sku ds where ds.sku_id = '${sku.SKU_ID}' and ds.on_sale = 0 UNION select ds.sale_price as price from dcs_sku ds where ds.sku_id = '${sku.SKU_ID}' and ds.on_sale = 1 pk=SKU_ID field column=PRICE name=price/ /entity /entity entity name=studio dataSource=prodcat query=select gs.name from gl_product_studio gps, gl_studio gs where gps.studio_id = gs.studio_id and gps.product_id = '${product.PRODUCT_ID}' rootEntity=false pk=PRODUCT_ID field column=NAME name=studio/ /entity entity name=star dataSource=prodcat query=select gc.name from gl_contributor gc, gl_product_contributor gpc where gc.contributor_id = gpc.contributor_id and gpc.product_id = '${product.PRODUCT_ID}' rootEntity=false pk=PRODUCT_ID, CONTRIBUTOR_ID field column=NAME name=stars/ /entity entity name=director dataSource=prodcat query=select gc.name from gl_contributor gc, gl_product_director gpd where gc.contributor_id = gpd.contributor_id and gpd.product_id = '${product.PRODUCT_ID}' rootEntity=false pk=PRODUCT_ID, CONTRIBUTOR_ID field column=NAME name=directors/ /entity entity name=keyword dataSource=prodcat query=select dcs_category.display_name as keyword_name from dcs_cat_chldprd, dcs_category, gl_category where gl_category.availability = 0 and gl_category.exclude_in_vivisimo = 0 and
Re: getting value from parent query in subquery transformer
you do not need to pass the values as shown here. Make use of the Context parameter (second implicit parameter) to get hold of the value of ${item.category} context.getVariableResolver().resolve('item.category')) On Wed, Dec 2, 2009 at 7:20 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I have an entity that has a entity within it that executes a query for each row and calls a transformer. Is there a way to pass a value from the parent query into the transformer? For example, I have an entity called document, and it it has an ID and sometimes it has a category. I have a sub entity called category that does another complex query using the documents ID to get data to send to the transformer to determine the category. I would like to pass the parents category to this transformer, so I dont have to join in data I already have. Is this possible? Im using ${item.id} in the where clause, so I guess im wondering, can I do something like. entity name=item query=.. entity name=category transformer=script:SplitAndPrettyCategory(${item.category}) query=.. thanks Joel -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: no error delta fail with DataImportHandler
probably you can try out this http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta and it may give you more info on what is happeining On Thu, Dec 3, 2009 at 10:58 PM, Thomas Woodard gtfo...@hotmail.com wrote: Unfortunately that isn't it. I have tried id, product_id, and PRODUCT_ID, and they all produce the same result. It finds the modified item, but then does nothing. INFO: Running ModifiedRowKey() for Entity: product Dec 3, 2009 9:25:25 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity product with URL: jdbc:oracle:oci:@dev.eline.com Dec 3, 2009 9:25:25 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 283 Dec 3, 2009 9:25:25 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: product rows obtained : 1 Dec 3, 2009 9:25:25 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: product rows obtained : 0 Dec 3, 2009 9:25:25 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: product Dec 3, 2009 9:25:25 AM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Delta Import completed successfully Dec 3, 2009 9:25:25 AM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:0:0.404 From: noble.p...@corp.aol.com Date: Thu, 3 Dec 2009 12:50:15 +0530 Subject: Re: no error delta fail with DataImportHandler To: solr-user@lucene.apache.org the deltaQuery select 'product_id' and your deltaImportQuery uses ${dataimporter.delta.id} I guess it should have been ${dataimporter.delta. product_id} On Wed, Dec 2, 2009 at 11:52 PM, Thomas Woodard gtfo...@hotmail.com wrote: I'm trying to get delta indexing set up. My configuration allows a full index no problem, but when I create a test delta of a single record, the delta import finds the record but then does nothing. I can only assume I have something subtly wrong with my configuration, but according to the wiki, my configuration should be valid. What I am trying to do is have a single delta detected on the top level entity trigger a rebuild of everything under that entity, the same as the first example in the wiki. Any help would be greatly appreciated. dataConfig á ádataSource name=prodcat driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:oci:@XXX á áuser=XXX password=XXX autoCommit=false transactionIsolation=TRANSACTION_READ_COMMITTED/ á ádocument á á á áentity name=product dataSource=prodcat query= á á á áselect dp.product_id, dp.display_name, dp.long_description, gp.orientation á á á áfrom dcs_product dp, gl_product gp á á á áwhere dp.product_id = gp.product_id transformer=ClobTransformer,HTMLStripTransformer á á á ádeltaImportQuery=select dp.product_id, dp.display_name, dp.long_description, gp.orientation á á á áfrom dcs_product dp, gl_product gp á á á áwhere dp.product_id = gp.product_id á á á áAND dp.product_id = '${dataimporter.delta.id}' á á á ádeltaQuery=select product_id from gl_product_modified where last_modified TO_DATE('${dataimporter.last_index_time}', '-mm-dd hh:mi:ss') á á á árootEntity=false á á á ápk=PRODUCT_ID á á á á á á!-- COLUMN NAMES ARE CASE SENSITIVE. THEY NEED TO BE ALL CAPS OR EVERYTHING FAILS -- á á á á á áfield column=PRODUCT_ID name=product_id/ á á á á á áfield column=DISPLAY_NAME name=name/ á á á á á áfield column=LONG_DESCRIPTION name=long_description clob=true stripHTML=true / á á á á á áfield column=ORIENTATION name=orientation/ á á á á á áentity name=sku dataSource=prodcat query=select ds.sku_id, ds.sku_type, ds.on_sale, '${product.PRODUCT_ID}' || '_' || ds.sku_id as unique_id á á á áfrom dcs_prd_chldsku dpc, dcs_sku ds á á á áwhere dpc.product_id = '${product.PRODUCT_ID}' á á á áand dpc.sku_id = ds.sku_id á á á árootEntity=true pk=PRODUCT_ID, SKU_ID á á á á á á á áfield column=SKU_ID name=sku_id/ á á á á á á á áfield column=SKU_TYPE name=sku_type/ á á á á á á á áfield column=ON_SALE name=on_sale/ á á á á á á á áfield column=UNIQUE_ID name=unique_id/ á á á á á á á áentity name=catalog dataSource=prodcat query=select pc.catalog_id á á á á á á á á á á á á á áfrom gl_prd_catalog pc, gl_sku_catalog sc á á á á á á á á á á á á á áwhere pc.product_id = '${product.PRODUCT_ID}' and sc.sku_id = '${sku.SKU_ID}' and pc.catalog_id = sc.catalog_id pk=SKU_ID, CATALOG_ID á á á á á á á á á á á áfield column=CATALOG_ID name=catalogs/ á á á á á á á á/entity á á á á á á á áentity name=price dataSource=prodcat query=select ds.list_price as price á á á á á á á á á á á á á áfrom dcs_sku ds á á á á á á á á á á á á á áwhere ds.sku_id = '${sku.SKU_ID}' á á á á á á á á á á á á á áand ds.on_sale = 0 á á á á á á á á á á á á á áUNION á á á á á á á á á á á á á áselect
Re: Exception encountered during replication on slave....Any clues?
are you able to hit the http://localhost:8080/postingsmaster/replication using a browser from the slave box. if you are able to hit it what do you see? On Tue, Dec 8, 2009 at 3:42 AM, William Pierce evalsi...@hotmail.com wrote: Just to make doubly sure, per tck's suggestion, I went in and explicitly added in the port in the masterurl so that it now reads: http://localhost:8080/postingsmaster/replication Still getting the same exception... I am running solr 1.4, on Ubuntu karmic, using tomcat 6 and Java 1.6. Thanks, - Bill -- From: William Pierce evalsi...@hotmail.com Sent: Monday, December 07, 2009 2:03 PM To: solr-user@lucene.apache.org Subject: Re: Exception encountered during replication on slaveAny clues? tck, thanks for your quick response. I am running on the default port (8080). If I copy that exact string given in the masterUrl and execute it in the browser I get a response from solr: ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime0/int /lst str name=statusOK/str str name=messageNo command/str /response So the masterUrl is reachable/accessible so far as I am able to tell Thanks, - Bill -- From: TCK moonwatcher32...@gmail.com Sent: Monday, December 07, 2009 1:50 PM To: solr-user@lucene.apache.org Subject: Re: Exception encountered during replication on slaveAny clues? are you missing the port number in the master's url ? -tck On Mon, Dec 7, 2009 at 4:44 PM, William Pierce evalsi...@hotmail.comwrote: Folks: I am seeing this exception in my logs that is causing my replication to fail. I start with a clean slate (empty data directory). I index the data on the postingsmaster using the dataimport handler and it succeeds. When the replication slave attempts to replicate it encounters this error. Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex SEVERE: Master at: http://localhost/postingsmaster/replication is not available. Index fetch failed. Exception: Invalid version or the data in not in 'javabin' format Any clues as to what I should look for to debug this further? Replication is enabled as follows: The postingsmaster solrconfig.xml looks as follows: requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'optimize' it can also be 'commit' -- str name=replicateAftercommit/str !--If configuration files need to be replicated give the names here . comma separated -- str name=confFiles/str /lst /requestHandler The postings slave solrconfig.xml looks as follows: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master -- str name=masterUrlhttp://localhost/postingsmaster/replication /str !--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a snappull can be triggered from the admin or the http API -- str name=pollInterval00:05:00/str /lst /requestHandler Thanks, - Bill -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: how to set CORE when using Apache Solr extension?
the core is a part of the uri http://host:port/solr-app/core-name/select say if the core name is core1 and solr app name is solr deployed at port 8983 then it would look like http://host:8983/solr/core1/select On Tue, Dec 8, 2009 at 3:44 AM, regany re...@newzealand.co.nz wrote: Hello, Can anyone tell me how you set which Solr CORE to use when using the Apache Solr extension? (Using Solr with multicores) http://www.php.net/manual/en/book.solr.php thanks, regan -- View this message in context: http://old.nabble.com/how-to-set-CORE-when-using-Apache-Solr-extension--tp26685174p26685174.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Oddly slow replication
this has to be a network problem . We have never encountered such vastly different speeds in the same LAN. On Tue, Dec 8, 2009 at 3:22 AM, Simon Wistow si...@thegestalt.org wrote: I have a Master server with two Slaves populated via Solr 1.4 native replication. Slave1 syncs at a respectable speed i.e around 100MB/s but Slave2 runs much, much slower - the peak I've seen is 56KB/s. Both are running off the same hardware with the same config - compression is set to 'internal' and http(Conn|Read)Timeout are defaults (5000/1). I've checked too see if it was a disk problem using dd and if it was a network problem by doing a manual scp and an rsync from the slave to the master and the master to the slave. I've shut down the replication polling on Slave1 just to see if that was causing the problem but there's been no improvement. Any ideas? -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: How to setup dynamic multicore replication
On Tue, Dec 8, 2009 at 2:43 PM, Thijs vonk.th...@gmail.com wrote: Hi I need some help setting up dynamic multicore replication. We are changing our setup from a replicated single core index with multiple document types, as described on the wiki[1], to a dynamic multicore setup. We need this so that we can display facets with a zero count that are unique to the document 'type'. So when indexing new documents we want to create new cores on the fly using the CoreAdminHandler through SolrJ. What I can't figure out is how I setup solr.xml and solrconfig.xml so that a core automatically is also replicated from the master to it's slaves once it's created. I have a solr.xml that starts like this: ?xml version='1.0' encoding='UTF-8'? solr persistent=true cores adminPath=/admin/cores /cores /solr and the replication part of solrconfig.xml master: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=replicateAfterstartup/str str name=replicateAfteroptimize/str str name=confFilesschema.xml/str /lst /requestHandler slave: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave str name=masterUrlhttp://localhost:8081/solr/replication/str str name=pollInterval00:00:20/str /lst /requestHandler I think I should change the masterUrl in the slave configuration to something like: str name=masterUrlhttp://localhost:8081/solr/${solr.core.name}/replication/str So that the replication automatically finds the correct core replication handler. if you have dynamically created cores this is the solution. But how do I tell the slaves a new core is created, and that is should start replicating those to? Thanks in advance. Thijs [1] http://wiki.apache.org/solr/MultipleIndexes#Flattening_Data_Into_a_Single_Index -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Tika and DIH integration (https://issues.apache.org/jira/browse/SOLR-1358)
we are very close to resolving SOLR-1358 . So you may be able to use it On Tue, Dec 8, 2009 at 5:32 PM, Jorg Heymans jorg.heym...@gmail.com wrote: Hi, I am looking into using Solr for indexing a large database that has documents (mostly pdf and msoffice) stored as CLOBs in several tables. It is my understanding that the DIH as provided in Solr 1.4 cannot index these CLOBs yet, and that SOLR-1358 should provide exactly this. So i was wondering what the most 'recommended' way is of solving this .. Should it be done with a custom textextractor of some sort, set on the column/field ? Thanks, Jorg -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Replicating multiple cores
On Wed, Dec 9, 2009 at 6:14 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Yes. I'd highly recommend using the Java replication though. Is there a reason? I understand it's new etc, however I think one issue with it is it's somewhat non-native access to the filesystem. Can you illustrate a real world advantage other than the enhanced admin screens? Complexity is the main problem w/ rsync based replication. you have to manage so many processes and monitor them separately. The other problem is managing snapshots. These snapshots need to be cleaned up every now and then. You do not have enough info on what is heppening/happened On Mon, Dec 7, 2009 at 11:13 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Tue, Dec 8, 2009 at 11:48 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: If I've got multiple cores on a server, I guess I need multiple rsyncd's running (if using the shell scripts)? Yes. I'd highly recommend using the Java replication though. -- Regards, Shalin Shekhar Mangar. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: indexing XML with solr example webapp - out of java heap space
the post.jar does not stream. use curl if you are using *nix. --Noble On Wed, Dec 9, 2009 at 12:28 AM, Feroze Daud fero...@zillow.com wrote: Hi! I downloaded SOLR and am trying to index an XML file. This XML file is huge (500M). When I try to index it using the post.jar tool in example\exampledocs, I get a out of java heap space error in the SimplePostTool application. Any ideas how to fix this? Passing in -Xms1024M does not fix it. Feroze. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DIH solrconfig
On Wed, Dec 9, 2009 at 3:34 PM, Lee Smith l...@weblee.co.uk wrote: Hi All There seems to be massive difference between the solrconfig in the DIH example to the one in the normal example ? Would I be correct in saying if I was to add the dataimport request handler in the solrconfig.xml thats all I will need ? ie: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler Is this correct ? yep . this is all you need Lee -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Exception encountered during replication on slave....Any clues?
try the url http://localhost:8080/postingsmaster/replication?command=indexversion using ur browser On Tue, Dec 8, 2009 at 9:56 PM, William Pierce evalsi...@hotmail.com wrote: Hi, Noble: When I hit the masterUrl from the slave box at http://localhost:8080/postingsmaster/replication I get the following xml response: ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime0/int /lst str name=statusOK/str str name=messageNo command/str /response And then when I look in the logs, I see the exception that I mentioned. What exactly does this error mean that replication is not available. By the way, when I go to the admin url for the slave and click on replication, I see a screen with the master url listed (as above) and the word unreachable after it. And, of course, the same exception shows up in the tomcat logs. Thanks, - Bill -- From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Sent: Monday, December 07, 2009 9:20 PM To: solr-user@lucene.apache.org Subject: Re: Exception encountered during replication on slaveAny clues? are you able to hit the http://localhost:8080/postingsmaster/replication using a browser from the slave box. if you are able to hit it what do you see? On Tue, Dec 8, 2009 at 3:42 AM, William Pierce evalsi...@hotmail.com wrote: Just to make doubly sure, per tck's suggestion, I went in and explicitly added in the port in the masterurl so that it now reads: http://localhost:8080/postingsmaster/replication Still getting the same exception... I am running solr 1.4, on Ubuntu karmic, using tomcat 6 and Java 1.6. Thanks, - Bill -- From: William Pierce evalsi...@hotmail.com Sent: Monday, December 07, 2009 2:03 PM To: solr-user@lucene.apache.org Subject: Re: Exception encountered during replication on slaveAny clues? tck, thanks for your quick response. I am running on the default port (8080). If I copy that exact string given in the masterUrl and execute it in the browser I get a response from solr: ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime0/int /lst str name=statusOK/str str name=messageNo command/str /response So the masterUrl is reachable/accessible so far as I am able to tell Thanks, - Bill -- From: TCK moonwatcher32...@gmail.com Sent: Monday, December 07, 2009 1:50 PM To: solr-user@lucene.apache.org Subject: Re: Exception encountered during replication on slaveAny clues? are you missing the port number in the master's url ? -tck On Mon, Dec 7, 2009 at 4:44 PM, William Pierce evalsi...@hotmail.comwrote: Folks: I am seeing this exception in my logs that is causing my replication to fail. I start with a clean slate (empty data directory). I index the data on the postingsmaster using the dataimport handler and it succeeds. When the replication slave attempts to replicate it encounters this error. Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex SEVERE: Master at: http://localhost/postingsmaster/replication is not available. Index fetch failed. Exception: Invalid version or the data in not in 'javabin' format Any clues as to what I should look for to debug this further? Replication is enabled as follows: The postingsmaster solrconfig.xml looks as follows: requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'optimize' it can also be 'commit' -- str name=replicateAftercommit/str !--If configuration files need to be replicated give the names here . comma separated -- str name=confFiles/str /lst /requestHandler The postings slave solrconfig.xml looks as follows: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master -- str name=masterUrlhttp://localhost/postingsmaster/replication /str !--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a snappull can be triggered from the admin or the http API -- str name=pollInterval00:05:00/str /lst /requestHandler Thanks, - Bill -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Custom Field sample?
how exactly do you wish to query these documents? On Fri, Dec 11, 2009 at 4:35 PM, Antonio Zippo reven...@yahoo.it wrote: I need to add theese features to each document Document1 --- Argument1, positive Argument2, positive Argument3, neutral Argument4, positive Argument5, negative Argument6, negative Document2 --- Argument1, negative Argument2, positive Argument3, negative Argument6, negative Argument7, neutral where the argument name is dynamic using a relational database I could use a master detail structure, but in solr? I thought about a Map or Pair field Da: Grant Ingersoll gsing...@apache.org A: solr-user@lucene.apache.org Inviato: Gio 10 dicembre 2009, 19:47:55 Oggetto: Re: Custom Field sample? Can you perhaps give a little more info on what problem you are trying to solve? FWIW, there are a lot of examples of custom FieldTypes in the Solr code. On Dec 10, 2009, at 11:46 AM, Antonio Zippo wrote: Hi all, could you help me to create a custom field? I need to create a field structured like a Map is it possible? how to define if the search string is on key or value (or both)? A way could be to create a char separated multivalued string field... but it isn't the best way. and with facets is the worst way could you give me a custom field sample? Thanks in advance, Revenge -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: question regarding dynamic fields
use a copyField to copy those fields to another field and search on that On Mon, Dec 14, 2009 at 1:00 PM, Phanindra Reva reva.phanin...@gmail.com wrote: Hello.., I have observed that the text or keywords which are being indexed using dynamicField concept are being searchable only when we mention field name too while querying.Am I wrong with my observation or is it the default and can not be changed? I am just wondering if there is any route to search the text indexed using dynamicFields with out having to mention the field name in the query. Thanks. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Request Assistance with DIH
On Sat, Dec 12, 2009 at 6:15 AM, Robbin rob...@drivesajeep.com wrote: I've been trying to use the DIH with oracle and would love it if someone could give me some pointers. I put the ojdbc14.jar in both the Tomcat lib and solr home/lib. I created a dataimport.xml and enabled it in the solrconfig.xml. I go to the http://solr server/solr/admin/dataimport.jsp. This all seems to be fine, but I get the default page response and doesn't look like the connection to the oracle server is even attempted. Did you trigger an import? what is the message on the we page and what do the logs say? I'm using the Solr 1.4 release on Nov 10. Do I need an oracle client on the server? I thought having the ojdbc jar should be sufficient. Any help or configuration examples for setting this up would be much appreciated. You need all the jars you would normally use to connect to Oracle. Thanks Robbin -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: apache-solr-common.jar
there is no solrcommon jar anymore. you may use the solrj jar which contains all the classes which were there in the comon jar. On Mon, Dec 14, 2009 at 9:22 PM, gudumba l gudumba.sm...@gmail.com wrote: Hello All, I have been using apache-solr-common-1.3.0.jar in my module. I am planning to shift to the latest version, because of course it has more flexibility. But it is really strange that I dont find any corresponding jar of the latest version. I have serached in total apachae solr 1.4 folder (which is downloaded from site), but have not found any. , I am sorry, its really silly to request for a jar, but have no option. Thanks. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: solr core size on disk
look at the index dir and see the size of the files . it is typically in $SOLR_HOME/data/index On Thu, Dec 17, 2009 at 2:56 AM, Matthieu Labour matth...@kikin.com wrote: Hi I am new to solr. Here is my question: How to find out the size of a solr core on disk ? Thank you matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: shards parameter
yes. put it under the defaults section in your standard requesthandler. On Thu, Dec 17, 2009 at 5:22 PM, pcurila p...@eea.sk wrote: Hello, is there any way to configure shards parameter in solrconfig.xml? So I do not need provide it in the url. Thanks Peter -- View this message in context: http://old.nabble.com/shards-parameter-tp26826908p26826908.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Is DataImportHandler ThreadSafe???
On Sat, Dec 19, 2009 at 2:16 PM, gurudev suyalprav...@yahoo.com wrote: Hi, Just wanted to know, Is the DataImportHandler available in solr1.3 thread-safe?. I would like to use multiple instances of data import handler running concurrently and posting my various set of data from DB to Index. Can I do this by registering the DIH multiple times with various names in solrconfig.xml and then invoking all concurrently to achieve maximum throughput? Would i need to define different data-config.xml's dataimport.properties for each DIH? yes , this should work. it is thread-safe If it would be possible to specify the query in data-config.xml to restrict one DIH from overlapping the data-set fetched by another DIH through some SQL clauses? -- View this message in context: http://old.nabble.com/Is-DataImportHandler-ThreadSafetp26853521p26853521.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Documents are indexed but not searchable
just search for *:* and see if the docs are indeed there in the index. --Noble On Mon, Dec 21, 2009 at 9:26 AM, krosan kro...@gmail.com wrote: Hi, I'm trying to test solr for a proof of concept project, but I'm having some problems. I indexed my document, but when I search for a word which is 100% certain in the document, I don't get any hits. These are my files: First: my data-config.xml dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://host.com:3306/crossfire3 user=user password=pass batchSize=1/ document entity name=users query=select username, password, email from users field column=username name=username / field column=password name=password / field column=email name=email / /entity /document /dataConfig Now, I have used this in the debugger, and with commit on, and verbose on, I get this reply: http://pastebin.com/m7a460711 This clearly states that those 2 rows have been processed and are now in the index. However, when I try to do a search with the http parameters, I get this response: For the hyperlink http://localhost:8080/solr/select?q=username:krosandebugQuery=on this is the response: http://pastebin.com/m7bb1dcaa I'm clueless on what the problem could be! These are my two config files: schema.xml: http://pastebin.com/m1fd1da58 solrconfig.xml: http://pastebin.com/m44b73d83 (look for krosan in the documents to see what I've added to the standard docs) Any help will be greatly appreciated! Thanks in advance, Andreas Evers -- View this message in context: http://old.nabble.com/Documents-are-indexed-but-not-searchable-tp26868925p26868925.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: suggestions for DIH batchSize
A bigger batchSize results in increased memory usage. I guess performance should be slightly better with bigger values but not verified. On Wed, Dec 23, 2009 at 2:51 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi, it looks like from looking at the code the default is 500, is the recommended setting for this? Has anyone notice any significant performance/memory tradeoffs by making this much bigger? thanks Joel -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Problem with simple use of DIH
did you run it w/o the debug? On Sun, Dec 27, 2009 at 6:31 PM, AHMET ARSLAN iori...@yahoo.com wrote: I'm trying to use DataImportHandler to load my index and having some strange results. I have two tables in my database. DPRODUC contains products and FSKUMAS contains the skus related to each product. This is the data-config I'm using. dataConfig dataSource type=JdbcDataSource driver=com.ibm.as400.access.AS400JDBCDriver url=jdbc:as400:IWAVE;prompt=false;naming=system user=IPGUI password=IPGUI/ document entity name=dproduc query=select dprprd, dprdes from dproduc where dprprd like 'F%' field column=dprprd name=id / field column=dprdes name=name / entity name=fskumas query=select fsksku, fcoclr, fszsiz, fskret from fskumas where dprprd='${dproduc.DPRPRD}' field column=fsksku name=sku / field column=fcoclr name=color / field column=fszsiz name=size / field column=fskret name=price / /entity /entity /document /dataConfig What is the primary key of dproduc table? If it is dprprd can you try adding pk=dprprd to entity name=dproduc? entity name=dproduc pk=dprprd query=select dprprd, dprdes from dproduc where dprprd like 'F%' -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Problem with simple use of DIH
The field names are case sensitive. But if the field tags are missing they are mapped to corresponding solr fields in a case insensistive way.apparently all the fields come out of you ALL CAPS you should put the 'column' values in ALL CAPS too On Sun, Dec 27, 2009 at 9:03 PM, Jay Fisher jay.l.fis...@gmail.com wrote: I did run it without debug and the result was that 0 documents were processed. The problem seems to be with the field tags that I was using to map from the table column names to the schema.xml field names. I switched to using an AS clause in the SQL statement instead and it worked. I think the column names may be case-sensitive, although I haven't proven that to be the case. I did discover that references to column names in the velocity template are case sensitive; ${dproduc.DPRPRD} works and ${dproduc.dprprd} does not. Thanks, Jay 2009/12/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com did you run it w/o the debug? On Sun, Dec 27, 2009 at 6:31 PM, AHMET ARSLAN iori...@yahoo.com wrote: I'm trying to use DataImportHandler to load my index and having some strange results. I have two tables in my database. DPRODUC contains products and FSKUMAS contains the skus related to each product. This is the data-config I'm using. dataConfig dataSource type=JdbcDataSource driver=com.ibm.as400.access.AS400JDBCDriver url=jdbc:as400:IWAVE;prompt=false;naming=system user=IPGUI password=IPGUI/ document entity name=dproduc query=select dprprd, dprdes from dproduc where dprprd like 'F%' field column=dprprd name=id / field column=dprdes name=name / entity name=fskumas query=select fsksku, fcoclr, fszsiz, fskret from fskumas where dprprd='${dproduc.DPRPRD}' field column=fsksku name=sku / field column=fcoclr name=color / field column=fszsiz name=size / field column=fskret name=price / /entity /entity /document /dataConfig What is the primary key of dproduc table? If it is dprprd can you try adding pk=dprprd to entity name=dproduc? entity name=dproduc pk=dprprd query=select dprprd, dprdes from dproduc where dprprd like 'F%' -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: fl parameter and dynamic fields
if you wish to search on fields using wild-card you have to use a copyField to copy all the values of Bool_* to another field and search on that field. On Tue, Dec 29, 2009 at 4:14 AM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: I use dynamic fields heavily in my SOLR config. I would like to be able to specify which fields should be returned from a query based on a pattern for the field name. For instance, given: dynamicField name=Bool_* type=boolean indexed=true stored=true / I might be able to construct a query like: http://localhost:8080/solr/select?q=Bool_*:truerows=10 Is there something like this in SOLR? Thanks, Tim Harsch -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: serialize SolrInputDocument to java.io.File and back again?
what serialization would you wish to use? you can use java serialization or solrj helps you serialize it as xml or javabin format (org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec) On Thu, Dec 31, 2009 at 6:55 AM, Phillip Rhodes rhodebumpl...@gmail.com wrote: I want to store a SolrInputDocument to the filesystem until it can be sent to the solr server via the solrj client. I will be using a quartz job to periodically query a table that contains a listing of SolrInputDocuments stored as java.io.File that need to be processed. Thanks for your time. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: replicating extension JARs
jars are not replicated. It is by design. But that is not to say that we can't do it. open an issue . On Wed, Jan 6, 2010 at 6:20 AM, Ryan Kennedy rcken...@gmail.com wrote: Will the built-in Solr replication replicate extension JAR files in the lib directory? The documentation appears to indicate that only the index and any specified configuration files will be replicated, however if your solrconfig.xml references a class in a JAR file added to the lib directory then you'll need that replicated as well (otherwise the slave will encounter ClassDefNotFound exceptions). I'm wondering if I'm missing something and Solr replication will do that or if it's a deficiency in Solr's replication. Ryan -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr Replication Questions
On Wed, Jan 6, 2010 at 2:51 AM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: http://wiki.apache.org/solr/SolrReplication I've been looking over this replication wiki and I'm still unclear on a two points about Solr Replication: 1. If there have been small changes to the index on the master, does the slave copy the entire contents of the index files that were affected? only the delta is copied. a. Let's say I add one document to the master. Presumably that causes changes to the position file, amidst a few others. Does the slave download the entire position file? Or just the portion that was changed? Lucene never modifies a file which was written by previous commits. So if you add a new document and commit , it is written to new files. Solr replication will only replicate those new files 2. If you have a multi-core slave, is it possible to share one configuration file (i.e. one instance directory) amidst the multiple cores, and yet each core poll a different master? a. Can you set the masterUrl for each core separately in the server.xml? Thanks for your help, Gio. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: readOnly=true IndexReader
On Wed, Jan 6, 2010 at 4:26 PM, Patrick Sauts patrick.via...@gmail.com wrote: In the Wiki page : http://wiki.apache.org/lucene-java/ImproveSearchingSpeed, I've found -Open the IndexReader with readOnly=true. This makes a big difference when multiple threads are sharing the same reader, as it removes certain sources of thread contention. How to open the IndexReader with readOnly=true ? I can't find anything related to this parameter. Do the VJM parameters -Dslave=disabled or -Dmaster=disabled have any incidence on solr with a standart solrConfig.xml? these are not variables used by Solr. These are just substituted in solrconfig.xml and probably consumed by ReplicationHandler (this is not a standard) Thank you for your answers. Patrick. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: replication -- missing field data file
On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. The backup is done asynchronously. So it always gives an OK response immedietly. The backup is created in the data dir itself Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65) at org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599) at org.apache.lucene.index.DirectoryReader.lt;initgt;(DirectoryReader.java:103) at org.apache.lucene.index.ReadOnlyDirectoryReader.lt;initgt;(ReadOnlyDirectoryReader.java:27) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:73) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:68) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at
Re: replication -- missing field data file
the index dir is in the name index others will be stored as indexdate-as-number On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: How can you differentiate between the backup and the normal index files? -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Wednesday, January 06, 2010 11:52 AM To: solr-user Subject: Re: replication -- missing field data file On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. The backup is done asynchronously. So it always gives an OK response immedietly. The backup is created in the data dir itself Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.lt;initgt;(SimpleFSDirectory.java:108) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:65) at org.apache.lucene.index.FieldsReader.lt;initgt;(FieldsReader.java:104) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:277) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:640) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:599) at
Re: replication -- missing field data file
actually it does not. BTW, FYI, backup is just to take periodics backups not necessary for the Replicationhandler to work On Thu, Jan 7, 2010 at 2:37 AM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: How can you tell when the backup is done? -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Wednesday, January 06, 2010 12:23 PM To: solr-user Subject: Re: replication -- missing field data file the index dir is in the name index others will be stored as indexdate-as-number On Wed, Jan 6, 2010 at 10:31 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: How can you differentiate between the backup and the normal index files? -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble Paul ??? ?? Sent: Wednesday, January 06, 2010 11:52 AM To: solr-user Subject: Re: replication -- missing field data file On Wed, Jan 6, 2010 at 9:49 PM, Giovanni Fernandez-Kincade gfernandez-kinc...@capitaliq.com wrote: I set up replication between 2 cores on one master and 2 cores on one slave. Before doing this the master was working without issues, and I stopped all indexing on the master. Now that replication has synced the index files, an .FDT field is suddenly missing on both the master and the slave. Pretty much every operation (core reload, commit, add document) fails with an error like the one posted below. How could this happen? How can one recover from such an error? Is there any way to regenerate the FDT file without re-indexing everything? This brings me to a question about backups. If I run the replication?command=backup command, where is this backup stored? I've tried this a few times and get an OK response from the machine, but I don't see the backup generated anywhere. The backup is done asynchronously. So it always gives an OK response immedietly. The backup is created in the data dir itself Thanks, Gio. org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:412) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:142) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:298) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.core.SolrCore.lt;initgt;(SolrCore.java:579) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:486) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:409) ... 18 more Caused by: java.io.FileNotFoundException: Y:\solrData\FilingsCore2\index\_a0r.fdt (The system cannot find the file specified) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.lt;initgt;(Unknown Source) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.lt;initgt;(SimpleFSDirectory.java:78) at
Re: Synonyms from Database
On Sun, Jan 10, 2010 at 1:04 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Ravi, I think if your synonyms were in a DB, it would be trivial to periodically dump them into a text file Solr expects. You wouldn't want to hit the DB to look up synonyms at query time... Why query time. Can it not be done at startup time ? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Ravi Gidwani ravi.gidw...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 9, 2010 10:20:18 PM Subject: Synonyms from Database Hi : Is there any work done in providing synonyms from a database instead of synonyms.txt file ? Idea is to have a dictionary in DB that can be enhanced on the fly in the application. This can then be used at query time to check for synonyms. I know I am not putting thoughts to the performance implications of this approach, but will love to hear about others thoughts. ~Ravi. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Data Full Import Error
You need more memory to run dataimport. On Tue, Jan 12, 2010 at 4:46 PM, Lee Smith l...@weblee.co.uk wrote: Hi All I am trying to do a data import but I am getting the following error. INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=405 2010-01-12 03:08:08.576::WARN: Error for /solr/dataimport java.lang.OutOfMemoryError: Java heap space Jan 12, 2010 3:08:05 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.lang.OutOfMemoryError: Java heap space Exception in thread btpool0-2 java.lang.OutOfMemoryError: Java heap space Jan 12, 2010 3:08:14 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jan 12, 2010 3:08:21 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback Jan 12, 2010 3:08:23 AM org.apache.solr.update.SolrIndexWriter finalize SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! This is OK. don't bother Any ideas what this can be ?? Hope you can help. Lee -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Data Full Import Error
it is the way you start your solr server( -Xmx option) On Tue, Jan 12, 2010 at 6:00 PM, Lee Smith l...@weblee.co.uk wrote: Thank you for your response. Will I just need to adjust the allowed memory in a config file or is this a server issue. ? Sorry I know nothing about Java. Hope you can advise ! On 12 Jan 2010, at 12:26, Noble Paul നോബിള് नोब्ळ् wrote: You need more memory to run dataimport. On Tue, Jan 12, 2010 at 4:46 PM, Lee Smith l...@weblee.co.uk wrote: Hi All I am trying to do a data import but I am getting the following error. INFO: [] webapp=/solr path=/dataimport params={command=status} status=0 QTime=405 2010-01-12 03:08:08.576::WARN: Error for /solr/dataimport java.lang.OutOfMemoryError: Java heap space Jan 12, 2010 3:08:05 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.lang.OutOfMemoryError: Java heap space Exception in thread btpool0-2 java.lang.OutOfMemoryError: Java heap space Jan 12, 2010 3:08:14 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: start rollback Jan 12, 2010 3:08:21 AM org.apache.solr.update.DirectUpdateHandler2 rollback INFO: end_rollback Jan 12, 2010 3:08:23 AM org.apache.solr.update.SolrIndexWriter finalize SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! This is OK. don't bother Any ideas what this can be ?? Hope you can help. Lee -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler - synchronous execution
it can be added On Tue, Jan 12, 2010 at 10:18 PM, Alexey Serba ase...@gmail.com wrote: Hi, I found that there's no explicit option to run DataImportHandler in a synchronous mode. I need that option to run DIH from SolrJ ( EmbeddedSolrServer ) in the same thread. Currently I pass dummy stream to DIH as a workaround for this, but I think it makes sense to add specific option for that. Any objections? Alex -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: NullPointerException in ReplicationHandler.postCommit + question about compression
When you copy paste config from wiki, just copy what you need. excluding documentation and comments On Wed, Jan 13, 2010 at 12:51 AM, Stephen Weiss swe...@stylesight.com wrote: Hi Solr List, We're trying to set up java-based replication with Solr 1.4 (dist tarball). We are running this to start with on a pair of test servers just to see how things go. There's one major problem we can't seem to get past. When we replicate manually (via the admin page) things seem to go well. However, when replication is triggered by a commit event on the master, the master gets a NullPointerException and no replication seems to take place. SEVERE: java.lang.NullPointerException at org.apache.solr.handler.ReplicationHandler$4.postCommit(ReplicationHandler.java:922) at org.apache.solr.update.UpdateHandler.callPostCommitCallbacks(UpdateHandler.java:78) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:411) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:169) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:336) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:239) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:741) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:213) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) This is the master config: requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string-- str name=replicateAftercommit/str !--Create a backup after 'optimize'. Other values can be 'commit', 'startup'. It is possible to have multiple entries of this config string. Note that this is just for backup, replication does not require this. -- !-- str name=backupAfteroptimize/str -- !--If configuration files need to be replicated give the names here, separated by comma -- str name=confFilessolrconfig_slave.xml:solrconfig.xml,schema.xml,synonyms.txt,stopwords.txt,elevate.xml/str !--The default value of reservation is 10 secs.See the documentation below. Normally , you should not need to specify this -- str name=commitReserveDuration00:00:10/str /lst /requestHandler and... the slave config: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master . It is possible to pass on this as a request param for the fetchindex command-- str name=masterUrlhttp://hostname.obscured.com:8080/solr/calendar_core/replication/str !--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a fetchindex can be triggered from the admin or the http API -- str name=pollInterval00:00:20/str !-- THE FOLLOWING PARAMETERS ARE USUALLY NOT REQUIRED-- !--to use compression while transferring the index files. The possible values are internal|external
Re: Fastest way to use solrj
2010/1/19 Tim Terlegård tim.terleg...@gmail.com: There are a few ways to use solrj. I just learned that I can use the javabin format to get some performance gain. But when I try the binary format nothing is added to the index. This is how I try to use this: server = new CommonsHttpSolrServer(http://localhost:8983/solr;) server.setRequestWriter(new BinaryRequestWriter()) request = new UpdateRequest() request.setAction(UpdateRequest.ACTION.COMMIT, true, true); request.setParam(stream.file, /tmp/data.bin) request.process(server) Should this work? Could there be something wrong with the file? I haven't found a good reference for how to create a javabin file, but by reading the source code I came up with this (groovy code): BinaryRequestWriter does not read from a file and post it fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) I haven't found any examples of using stream.file like this with a binary file. Is it supported? Is it better/faster to use StreamingUpdateSolrServer and send everything over HTTP instead? Would code for that look something like this? while (moreDocs) { xmlDoc = readDocFromFileUsingSaxParser() doc = new SolrInputDocument() doc.addField(id, 9-0) doc.addField(text, Some text) server.add(doc) } To me it instinctively looks as if stream.file would be faster because it doesn't have to use HTTP and it doesn't have to create a bunch of SolrInputDocument objects. /Tim -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DIH delta import - last modified date
While invoking the delta-import you may, pass the value as a request parameter. That value can be used in the query as ${dih.request.xyz} where as xyz is the request parameter name On Wed, Jan 20, 2010 at 1:15 AM, Yao Ge yao...@gmail.com wrote: I am struggling with the concept of delta import in DIH. According the to documentation, the delta import will automatically record the last index time stamp and make it available to use for the delta query. However in many case when the last_modified date time stamp in the database lag behind the current time, the last index time stamp is the not good for delta query. Can I pick a different mechanism to generate last_index_time by using time stamp computed from the database (such as from a column of the database)? -- View this message in context: http://old.nabble.com/DIH-delta-import---last-modified-date-tp27231449p27231449.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Fastest way to use solrj
2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Replication Handler Severe Error: Unable to move index file
is it a one off case? do you observerve this frequently? On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: It's hard to tell without poking around, but one of the first things I'd do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm - does this file/dir really exist? Or, rather, did it exist when the error happened. I'm not looking at the source code now, but is that really the only error you got? No exception stack trace? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Trey solrt...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, January 20, 2010 11:54:43 PM Subject: Replication Handler Severe Error: Unable to move index file Does anyone know what would cause the following error?: 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile SEVERE: *Unable to move index file* from: /home/solr/cores/core8/index.20100119103919/_6qv.fnm to: /home/solr/cores/core8/index/_6qv.fnm This occurred a few days back and we noticed that several full copies of the index were subsequently pulled from the master to the slave, effectively evicting our live index from RAM (the linux os cache), and killing our query performance due to disk io contention. Has anyone experienced this behavior recently? I found an old thread about this error from early 2009, but it looks like it was patched almost a year ago: http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html Additional Relevant information: -We are using the Solr 1.4 official release + a field collapsing patch from mid December (which I believe should only affect query side, not indexing / replication). -Our Replication PollInterval for slaves checking the master is very small (15 seconds) -We have a multi-box distributed search with each box possessing multiple cores -We issue a manual (rolling) optimize across the cores on the master once a day (occurred ~ 1-2 hours before the above timeline) -maxWarmingSearchers is set to 1. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Replication Handler Severe Error: Unable to move index file
On Fri, Jan 22, 2010 at 4:24 AM, Trey solrt...@gmail.com wrote: Unfortunately, when I went back to look at the logs this morning, the log file had been blown away... that puts a major damper on my debugging capabilities - so sorry about that. As a double whammy, we optimize nightly, so the old index files have completely changed at this point. I do not remember seeing an exception / stack trace in the logs associated with the SEVERE *Unable to move file* entry, but we were grepping the logs, so if it was outputted onto another line it could have possibly been there. I wouldn't really expect to see anything based upon the code in SnapPuller.java: /** * Copy a file by the File#renameTo() method. If it fails, it is considered a failure * p/ * Todo may be we should try a simple copy if it fails */ private boolean copyAFile(File tmpIdxDir, File indexDir, String fname, ListString copiedfiles) { File indexFileInTmpDir = new File(tmpIdxDir, fname); File indexFileInIndex = new File(indexDir, fname); boolean success = indexFileInTmpDir.renameTo(indexFileInIndex); if (!success) { LOG.error(Unable to move index file from: + indexFileInTmpDir + to: + indexFileInIndex); for (String f : copiedfiles) { File indexFile = new File(indexDir, f); if (indexFile.exists()) indexFile.delete(); } delTree(tmpIdxDir); return false; } return true; } In terms of whether this is an off case: this is the first occurrence of this I have seen in the logs. We tried to replicate the conditions under which the exception occurred, but were unable. I'll send along some more useful info if this happens again. In terms of the behavior we saw: It appears that a replication occurred and the Unable to move file error occurred. As a result, it looks like the ENTIRE index was subsequently replicated again into a temporary directory (several times, over and over). The end result was that we had multiple full copies of the index in temporary index folders on the slave, and the original still couldn't be updated (the move to ./index wouldn't work). Does Solr ever hold files open in a manner that would prevent a file in the index directory from being overridden? There is a TODO which says manual it try to copy if move (renameTo) fails. We never did it because we never observed renameTo failing. 2010/1/21 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com is it a one off case? do you observerve this frequently? On Thu, Jan 21, 2010 at 11:26 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: It's hard to tell without poking around, but one of the first things I'd do would be to look for /home/solr/cores/core8/index.20100119103919/_6qv.fnm - does this file/dir really exist? Or, rather, did it exist when the error happened. I'm not looking at the source code now, but is that really the only error you got? No exception stack trace? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message From: Trey solrt...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, January 20, 2010 11:54:43 PM Subject: Replication Handler Severe Error: Unable to move index file Does anyone know what would cause the following error?: 10:45:10 AM org.apache.solr.handler.SnapPuller copyAFile SEVERE: *Unable to move index file* from: /home/solr/cores/core8/index.20100119103919/_6qv.fnm to: /home/solr/cores/core8/index/_6qv.fnm This occurred a few days back and we noticed that several full copies of the index were subsequently pulled from the master to the slave, effectively evicting our live index from RAM (the linux os cache), and killing our query performance due to disk io contention. Has anyone experienced this behavior recently? I found an old thread about this error from early 2009, but it looks like it was patched almost a year ago: http://old.nabble.com/%22Unable-to-move-index-file%22-error-during-replication-td21157722.html Additional Relevant information: -We are using the Solr 1.4 official release + a field collapsing patch from mid December (which I believe should only affect query side, not indexing / replication). -Our Replication PollInterval for slaves checking the master is very small (15 seconds) -We have a multi-box distributed search with each box possessing multiple cores -We issue a manual (rolling) optimize across the cores on the master once a day (occurred ~ 1-2 hours before the above timeline) -maxWarmingSearchers is set to 1. -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource
There is no corresponding DataSurce which can be used with TikaEntityProcessor which reads from BLOB I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737 On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal ns...@columnit.com wrote: Hi, I am fairly new to Solr and would like to use the DIH to pull rich text files (pdfs, etc) from BLOB fields in my database. There was a suggestion made to use the FieldReaderDataSource with the recently commited TikaEntityProcessor. Has anyone accomplished this? This is my configuration, and the resulting error - I'm not sure if I'm using the FieldReaderDataSource correctly. If anyone could shed light on whether I am going the right direction or not, it would be appreciated. ---Data-config.xml: dataConfig datasource name=f1 type=FieldReaderDataSource / dataSource name=orcle driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:un/p...@host:1521:sid / document entity dataSource=orcle name=attach query=select id as name, attachment from testtable2 entity dataSource=f1 processor=TikaEntityProcessor dataField=attach.attachment format=text field column=text name=NAME / /entity /entity /document /dataConfig -Debug error: response lst name=responseHeader int name=status0/int int name=QTime203/int /lst lst name=initArgs lst name=defaults str name=configtestdb-data-config.xml/str /lst /lst str name=commandfull-import/str str name=modedebug/str null name=documents/ lst name=verbose-output lst name=entity:attach lst name=document#1 str name=queryselect id as name, attachment from testtable2/str str name=time-taken0:0:0.32/str str--- row #1-/str str name=NAMEjava.math.BigDecimal:2/str str name=ATTACHMENToracle.sql.BLOB:oracle.sql.b...@1c8e807/str str-/str lst name=entity:253433571801723 str name=EXCEPTION org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource :f1 available for entity :253433571801723 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da taImporter.java:279) at org.apache.solr.handler.dataimport.ContextImpl.getDataSource(ContextImpl .java:93) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntit yProcessor.java:97) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Entity ProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:357) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.j ava:383) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java :242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:18 0) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte r.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java :389) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(D ataImportHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHan dler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:2 16) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandler Collection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.jav a:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConne ction.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
Re: Fastest way to use solrj
if you write only a few docs you may not observe much difference in size. if you write large no:of docs you may observe a big difference. 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I got the binary format to work perfectly now. Performance is better than with xml. Thanks! Although, it doesn't look like a binary file is smaller in size than an xml file? /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/21 Tim Terlegård tim.terleg...@gmail.com: Yes, it worked! Thank you very much. But do I need to use curl or can I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't use BinaryWriter then I don't know how to do this. if your data is serialized using JavaBinUpdateRequestCodec, you may POST it using curl. If you are writing directly , use CommonsHttpSolrServer /Tim 2010/1/20 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Fastest way to use solrj
how many fields are there in each doc? the binary format just reduces overhead. it does not touch/compress the payload 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I have 3 millon documents, each having 5000 chars. The xml file is about 15GB. The binary file is also about 15GB. I was a bit surprised about this. It doesn't bother me much though. At least it performs better. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: if you write only a few docs you may not observe much difference in size. if you write large no:of docs you may observe a big difference. 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I got the binary format to work perfectly now. Performance is better than with xml. Thanks! Although, it doesn't look like a binary file is smaller in size than an xml file? /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/21 Tim Terlegård tim.terleg...@gmail.com: Yes, it worked! Thank you very much. But do I need to use curl or can I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't use BinaryWriter then I don't know how to do this. if your data is serialized using JavaBinUpdateRequestCodec, you may POST it using curl. If you are writing directly , use CommonsHttpSolrServer /Tim 2010/1/20 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Fastest way to use solrj
The binary format just reduces overhead. in your case , all the data is in the big text field which is not compressed. But overall, the parsing is a lot faster for the binary format. So you see a perf boost 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I have 6 fields. The text field is the biggest, it contains almost all of the 5000 chars. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: how many fields are there in each doc? the binary format just reduces overhead. it does not touch/compress the payload 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I have 3 millon documents, each having 5000 chars. The xml file is about 15GB. The binary file is also about 15GB. I was a bit surprised about this. It doesn't bother me much though. At least it performs better. /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: if you write only a few docs you may not observe much difference in size. if you write large no:of docs you may observe a big difference. 2010/1/27 Tim Terlegård tim.terleg...@gmail.com: I got the binary format to work perfectly now. Performance is better than with xml. Thanks! Although, it doesn't look like a binary file is smaller in size than an xml file? /Tim 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/21 Tim Terlegård tim.terleg...@gmail.com: Yes, it worked! Thank you very much. But do I need to use curl or can I use CommonsHttpSolrServer or StreamingUpdateSolrServer? If I can't use BinaryWriter then I don't know how to do this. if your data is serialized using JavaBinUpdateRequestCodec, you may POST it using curl. If you are writing directly , use CommonsHttpSolrServer /Tim 2010/1/20 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: 2010/1/20 Tim Terlegård tim.terleg...@gmail.com: BinaryRequestWriter does not read from a file and post it Is there any other way or is this use case not supported? I tried this: $ curl host/solr/update/javabin -F stream.file=/tmp/data.bin $ curl host/solr/update -F stream.body=' commit /' Solr did read the file, because solr complained when the file wasn't in the format the JavaBinUpdateRequestCodec expected. But no data is added to the index for some reason. how did you create the file /tmp/data.bin ? what is the format? I wrote this in the first email. It's in the javabin format (I think). I did like this (groovy code): fieldId = new NamedList() fieldId.add(name, id) fieldId.add(val, 9-0) fieldId.add(boost, null) fieldText = new NamedList() fieldText.add(name, text) fieldText.add(val, Some text) fieldText.add(boost, null) fieldNull = new NamedList() fieldNull.add(boost, null) doc = [fieldNull, fieldId, fieldText] docs = [doc] root = new NamedList() root.add(docs, docs) fos = new FileOutputStream(data.bin) new JavaBinCodec().marshal(root, fos) /Tim JavaBin is a format. use this method JavaBinUpdateRequestCodec# marshal(UpdateRequest updateRequest, OutputStream os) The output of this can be posted to solr and it should work -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Help using CachedSqlEntityProcessor
cacheKey and cacheLookup are required attributes . On Thu, Jan 28, 2010 at 12:51 AM, KirstyS kirst...@gmail.com wrote: Thanks. I am on 1.4..so maybe that is the problem. Will try when I get back to work tomorrow. Thanks Rolf Johansson-2 wrote: I recently had issues with CachedSqlEntityProcessor too, figuring out how to use the syntax. After a while, I managed to get it working with cacheKey and cacheLookup. I think this is 1.4 specific though. It seems you have double WHERE clauses, one in the query and one in the where attribute. Try using cacheKey and cacheLookup instead in something like this: entity name=LinkedCategory pk=LinkedCatArticleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) processor=CachedSqlEntityProcessor cacheKey=LINKEDCATARTICLEID cacheLookup=article.CMSARTICLEID deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}' parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity /Rolf Den 2010-01-27 12.36, skrev KirstyS kirst...@gmail.com: Hi, I have looked on the wiki. Using the CachedSqlEntityProcessor looks like it was simple. But I am getting no speed benefit and am not sure if I have even got the syntax correct. I have a main root entity called 'article'. And then I have a number of sub entities. One such entity is as such : entity name=LinkedCategory pk=LinkedCatAricleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') processor=CachedSqlEntityProcessor WHERE=LinkedCatArticleId = article.CmsArticleId deltaQuery=SELECT LinkedCategoryBC FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') AND (convert(varchar(50), LastUpdateDate) '${dataimporter.article.last_index_time}' OR convert(varchar(50), PublishDate) '${dataimporter.article.last_index_time}') parentDeltaQuery=SELECT * from vArticleSummaryDetail_SolrSearch (nolock) WHERE convert(varchar(50), CmsArticleId) = convert(varchar(50), '${article.CmsArticleId}') field column=LinkedCategoryBC name=LinkedCategoryBreadCrumb/ /entity As you can see I have added (for the main query - not worrying about the delta queries yet!!) the processor and the 'where' but not sure if it's correct. Can anyone point me in the right direction??? Thanks Kirsty -- View this message in context: http://old.nabble.com/Help-using-CachedSqlEntityProcessor-tp27337635p27345412.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Help using CachedSqlEntityProcessor
Thanks for pointing this out. The wiki had a problem fro a while and we could not update the documentation. It is updated here http://wiki.apache.org/solr/DataImportHandler#cached On Thu, Jan 28, 2010 at 6:31 PM, KirstyS kirst...@gmail.com wrote: Thanks, I saw that mistake and I have it working now!!! thank you for all your help. Out of interest, is the cacheKey and cacheLookup documented anywhere? Rolf Johansson-2 wrote: It's always a good thing if you can check the debug log (fx catalina.out) or run with debug/verbose to check how Solr runs trough the dataconfig. You've also made a typo in the pk and query, LinkedCatAricleId is missing a t. /Rolf Den 2010-01-28 11.20, skrev KirstyS kirst...@gmail.com: Okay, I changed my entity to look like this (have included my main entity as well): document name=ArticleDocument entity name=article pk=CmsArticleId query=Select * from vArticleSummaryDetail_SolrSearch (nolock) WHERE ArticleStatusId = 1 entity name=LinkedCategory pk=LinkedCatAricleId query=SELECT LinkedCategoryBC, CmsArticleId as LinkedCatAricleId FROM LinkedCategoryBreadCrumb_SolrSearch (nolock) processor=CachedSqlEntityProcessor cacheKey=LinkedCatArticleId cacheLookup=article.CmsArticleId /entity /entity /document BUT now the index is taking SO much longer Have I missed any other configurationg changes? Do I need to add anything into the solfconfig.xml file? Do I have my syntax completely wrong? Any help is greatly appreciated!!! -- View this message in context: http://old.nabble.com/Help-using-CachedSqlEntityProcessor-tp27337635p27355501.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr 1.4 Replication index directories
the index.20100127044500/ is a temp directory should have got cleaned up if there was no problem in replication (see the logs if there was a problem) . if there is a problem the temp directory will be used as the new index directory and the old one will no more be used.at any given point only one directory is used for the index. check the replication dashboard to check which one it is. Everything else can be deleted. On Fri, Jan 29, 2010 at 6:03 AM, mark angelillo li...@snooth.com wrote: Thanks, Otis. Responses inline. Hi, We're using the new replication and it's working pretty well. There's one detail I'd like to get some more information about. As the replication works, it creates versions of the index in the data directory. Originally we had index/, but now there are dated versions such as index.20100127044500/, which are the replicated versions. Each copy is sized in the vicinity of 65G. With our current hard drive it's fine to have two around, but 3 gets a little dicey. Sometimes we're finding that the replication doesn't always clean up after itself. I would like to understand this better, or to not have this happen. It could be a configuration issue. Some more specific questions: - Is it safe to remove the index/ directory (that doesn't have the date on it)? I think I tried this once and the whole thing broke, however maybe something else was wrong at the time. No, that's the real, live index, you don't want to remove that one. Yeah... I tried it once and remember things breaking. However nothing in this directory has been modified for over a week (since the last replication initialization). And I'm still sitting on 130GB of data for what is only 65GB on the master - Is there a way to know which one is the current one? (I'm looking at the file index.properties, and it seems to be correct, but sometimes there's a newer version in the directory, which later is removed) I think the index one is always current, no? If not, I imagine the admin replication page will tell you, or even the Statistics page. e.g. reader : SolrIndexReader{this=46a55e,r=readonlysegmentrea...@46a55e,segments=1} readerDir : org.apache.lucene.store.NIOFSDirectory@/mnt/solrhome/cores/foo/data/index reader : SolrIndexReader{this=5c3aef1,r=readonlydirectoryrea...@5c3aef1,refCnt=1,segments=9} readerDir : org.apache.lucene.store.NIOFSDirectory@/home/solr/solr_1.4/solr/data/index.20100127044500 - Could it be that the index does not finish replicating in the poll interval I give it? What happens if, say there's a poll interval X and replicating the index happens to take longer than X sometimes. (Our current poll interval is 45 minutes, and every time I'm watching it it completes in time.) you can keep a very small pollInterval and it is OK. if a replication is going on no new replication will be initiated till the old one completes I think only 1 replication will/should be happening at a time. Whew, that's comforting. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: loading an updateProcessorChain with multicore in trunk
I guess . default=true should not be necessary if there is only one updateRequestProcessorChain specified . Open an issue On Fri, Jan 29, 2010 at 6:06 PM, Marc Sturlese marc.sturl...@gmail.com wrote: I am testing trunk and have seen a different behaviour when loading updateProcessors wich I don't know if it's normal (at least with multicore) Before I use to use an updateProcessorChain this way: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processormyChain/str /lst /requestHandler updateRequestProcessorChain name=myChain processor class=org.apache.solr.update.processor.CustomUpdateProcessorFactory / processor class=org.apache.solr.update.processor.LogUpdateProcessorFactory / processor class=org.apache.solr.update.processor.RunUpdateProcessorFactory / /updateRequestProcessorChain It does not work in current trunk. I have debuged the code and I have seen now UpdateProcessorChain is loaded via: public T T initPlugins(ListPluginInfo pluginInfos, MapString, T registry, ClassT type, String defClassName) { T def = null; for (PluginInfo info : pluginInfos) { T o = createInitInstance(info,type, type.getSimpleName(), defClassName); registry.put(info.name, o); if(info.isDefault()){ def = o; } } return def; } As I don't have default=true in the configuration, my custom processorChain is not used. Setting default=true makes it work: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processormyChain/str /lst /requestHandler updateRequestProcessorChain name=myChain default=true processor class=org.apache.solr.update.processor.CustomUpdateProcessorFactory / processor class=org.apache.solr.update.processor.LogUpdateProcessorFactory / processor class=org.apache.solr.update.processor.RunUpdateProcessorFactory / /updateRequestProcessorChain As far as I understand, if you specify the chain you want to use in here: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processormyChain/str /lst /requestHandler Shouldn't be necesary to set it as default. Is it going to be kept this way? Thanks in advance -- View this message in context: http://old.nabble.com/loading-an-updateProcessorChain-with-multicore-in-trunk-tp27371375p27371375.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler problem - reading XML from a file
It clear that the xpaths provided won't fetch anything. because there is no data in those paths. what do you really wish to be indexed ? On Sun, Jan 31, 2010 at 10:30 AM, Lance Norskog goks...@gmail.com wrote: This DataImportHandler script does not find any documents in this HTML file. The DIH definitely opens the file, but the either the xpathprocessor gets no data or it does not recognize the xpaths described. Any hints? (I'm using Solr 1.5-dev, sometime recent.) Thanks! Lance xhtml-data-config.xml: dataConfig dataSource type=FileDataSource encoding=UTF-8 / document entity name=xhtml forEach=/html/head | /html/body processor=XPathEntityProcessor pk=id transformer=TemplateTransformer url=/cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html field column=head_s xpath=/html/head/ field column=body_s xpath=/html/body/ /entity /document /dataConfig Sample data file: cygwin/tmp/ch05-tokenizers-filters-Solr1.4.html ?xml version=1.0 encoding=UTF-8 ? html head meta content=en-US name=DC.language / /head body div id=header a href=ch05-tokenizers-filters-Solr1.4.htmlFirst/a span class=nolinkPrevious/span a href=ch05-tokenizers-filters-Solr1.41.htmlNext/a a href=ch05-tokenizers-filters-Solr1.460.htmlLast/a /div div dir=ltr id=content style=background-color:transparent h1 id=toc0 span class=SectionNumber1/span a id=RefHeading36402771/a a id=bkmRefHeading36402771/a Understanding Analyzers, Tokenizers, and Filters /h1 /div /body /html -- Lance Norskog goks...@gmail.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: replication setup
it is always recommended to paste your actual configuration and startup commands, instead of saying as described in wiki . On Tue, Jan 26, 2010 at 9:52 PM, Matthieu Labour matthieu_lab...@yahoo.com wrote: Hi I have set up replication following the wiki I downloaded the latest apache-solr-1.4 release and exploded it in 2 different directories I modified both solrconfig.xml for the master the slave as described on the wiki page In both sirectory, I started solr from the example directory example on the master: java -Dsolr.solr.home=multicore -Djetty.host=0.0.0.0 -Djetty.port=8983 -DSTOP.PORT=8078 -DSTOP.KEY=stop.now -jar start.jar and on the slave java -Dsolr.solr.home=multicore -Djetty.host=0.0.0.0 -Djetty.port=8982 -DSTOP.PORT=8077 -DSTOP.KEY=stop.now -jar start.jar I can see core0 and core 1 when I open the solr url However, I don't see a replication link and the following url solr url / replication returns a 404 error I must be doing something wrong. I would appreciate any help ! thanks a lot matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler delta-import confusion
try deltaImportQuery=select [bunch of stuff] WHERE m.moment_id = '${dataimporter.delta.moment_id}' The key has to be same and in the same case On Tue, Feb 2, 2010 at 1:45 AM, Jon Drukman jdruk...@gmail.com wrote: First, let me just say that DataImportHandler is fantastic. It got my old mysql-php-xml index rebuild process down from 30 hours to 6 minutes. I'm trying to use the delta-import functionality now but failing miserably. Here's my entity tag: (some SELECT statements reduced to increase readability) entity name=moment query=select ... deltaQuery=select moment_id from moments where date_modified '${dataimporter.last_index_time}' deltaImportQuery=select [bunch of stuff] WHERE m.moment_id = '${dataimporter.delta.MOMENTID}' pk=MOMENTID transformer=TemplateTransformer When I look at the MySQL query log I see the date modified query running fine and returning 3 rows. The deltaImportQuery, however, does not have the proper primary key in the where clause. It's just blank. I also tried changing it to ${moment.MOMENTID}. I don't really get the relation between the pk field and the ${dataimport.delta.whatever} stuff. Help please! -jsd- -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler delta-import confusion
Please do not hijack a thread. http://people.apache.org/~hossman/#threadhijack On Tue, Feb 2, 2010 at 11:32 PM, Leann Pereira le...@1sourcestaffing.com wrote: Hi Paul, Can you take me off this distribution list? Thanks, Leann From: noble.p...@gmail.com [noble.p...@gmail.com] On Behalf Of Noble Paul നോബിള് नोब्ळ् [noble.p...@corp.aol.com] Sent: Tuesday, February 02, 2010 2:12 AM To: solr-user@lucene.apache.org Subject: Re: DataImportHandler delta-import confusion try deltaImportQuery=select [bunch of stuff] WHERE m.moment_id = '${dataimporter.delta.moment_id}' The key has to be same and in the same case On Tue, Feb 2, 2010 at 1:45 AM, Jon Drukman jdruk...@gmail.com wrote: First, let me just say that DataImportHandler is fantastic. It got my old mysql-php-xml index rebuild process down from 30 hours to 6 minutes. I'm trying to use the delta-import functionality now but failing miserably. Here's my entity tag: (some SELECT statements reduced to increase readability) entity name=moment query=select ... deltaQuery=select moment_id from moments where date_modified '${dataimporter.last_index_time}' deltaImportQuery=select [bunch of stuff] WHERE m.moment_id = '${dataimporter.delta.MOMENTID}' pk=MOMENTID transformer=TemplateTransformer When I look at the MySQL query log I see the date modified query running fine and returning 3 rows. The deltaImportQuery, however, does not have the proper primary key in the where clause. It's just blank. I also tried changing it to ${moment.MOMENTID}. I don't really get the relation between the pk field and the ${dataimport.delta.whatever} stuff. Help please! -jsd- -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler - convertType attribute
implicit conversion can cause problem when Transformers are applied. It is hard for user to guess the type of the field by looking at the schema.xml. In Solr, String is the most commonly used type. if you wish to do numeric operations on a field convertType will cause problems. If it is explicitly set, user knows why the type got changed. On Tue, Feb 2, 2010 at 6:38 PM, Alexey Serba ase...@gmail.com wrote: Hello, I encountered blob indexing problem and found convertType solution in FAQhttp://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5 I was wondering why it is not enabled by default and found the following comment http://www.lucidimagination.com/search/document/169e6cc87dad5e67/dataimporthandler_and_blobs#169e6cc87dad5e67in mailing list: We used to attempt type conversion from the SQL type to the field's given type. We found that it was error prone and switched to using the ResultSet#getObject for all columns (making the old behavior a configurable option – convertType in JdbcDataSource). Why it is error prone? Is it safe enough to enable convertType for all jdbc data sources by default? What are the side effects? Thanks in advance, Alex -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler - convertType attribute
On Wed, Feb 3, 2010 at 3:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote: One thing I find awkward about convertType is that it is JdbcDataSource specific, rather than field-specific. Isn't the current implementation far too broad? it is feature of JdbcdataSource and no other dataSource offers it. we offer it because JDBC drivers have mechanism to do type conversion What do you mean by it is too broad? Erik On Feb 3, 2010, at 1:16 AM, Noble Paul നോബിള് नोब्ळ् wrote: implicit conversion can cause problem when Transformers are applied. It is hard for user to guess the type of the field by looking at the schema.xml. In Solr, String is the most commonly used type. if you wish to do numeric operations on a field convertType will cause problems. If it is explicitly set, user knows why the type got changed. On Tue, Feb 2, 2010 at 6:38 PM, Alexey Serba ase...@gmail.com wrote: Hello, I encountered blob indexing problem and found convertType solution in FAQhttp://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5 I was wondering why it is not enabled by default and found the following comment http://www.lucidimagination.com/search/document/169e6cc87dad5e67/dataimporthandler_and_blobs#169e6cc87dad5e67in mailing list: We used to attempt type conversion from the SQL type to the field's given type. We found that it was error prone and switched to using the ResultSet#getObject for all columns (making the old behavior a configurable option – convertType in JdbcDataSource). Why it is error prone? Is it safe enough to enable convertType for all jdbc data sources by default? What are the side effects? Thanks in advance, Alex -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DataImportHandler - convertType attribute
On Wed, Feb 3, 2010 at 4:16 PM, Erik Hatcher erik.hatc...@gmail.com wrote: On Feb 3, 2010, at 5:36 AM, Noble Paul നോബിള് नोब्ळ् wrote: On Wed, Feb 3, 2010 at 3:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote: One thing I find awkward about convertType is that it is JdbcDataSource specific, rather than field-specific. Isn't the current implementation far too broad? it is feature of JdbcdataSource and no other dataSource offers it. we offer it because JDBC drivers have mechanism to do type conversion What do you mean by it is too broad? I mean the convertType flag is not field-specific (or at least field overridable). Conversions occur on a per-field basis, but the setting is for the entire data source and thus all fields. Yes. it is true. First of all this is not very widely used, so fine tuning did not make sense Erik -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: java.lang.NullPointerException with MySQL DataImportHandler
On Thu, Feb 4, 2010 at 10:50 AM, Lance Norskog goks...@gmail.com wrote: I just tested this with a DIH that does not use database input. If the DataImportHandler JDBC code does not support a schema that has optional fields, that is a major weakness. Noble/Shalin, is this true? The problem is obviously not with DIH. DIH blindly passes on all the fields it could obtain from the DB. if some field is missing DIH does not do anything On Tue, Feb 2, 2010 at 8:50 AM, Sascha Szott sz...@zib.de wrote: Hi, since some of the fields used in your DIH configuration aren't mandatory (e.g., keywords and tags are defined as nullable in your db table schema), add a default value to all optional fields in your schema configuration (e.g., default = ). Note, that Solr does not understand the db-related concept of null values. Solr's log output SolrInputDocument[{keywords=keywords(1.0)={Dolce}, name=name(1.0)={Dolce amp; Gabbana Damp;G Neckties designer Tie for men 543}, productID=productID(1.0)={220213}}] indicates that there aren't any tags or descriptions stored for the item with productId 220213. Since no default value is specified, Solr raises an error when creating the index document. -Sascha Jean-Michel Philippon-Nadeau wrote: Hi, Thanks for the reply. On Tue, 2010-02-02 at 16:57 +0100, Sascha Szott wrote: * the output of MySQL's describe command for all tables/views referenced in your DIH configuration mysql describe products; ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | productID | int(10) unsigned | NO | PRI | NULL | auto_increment | | skuCode | varchar(320) | YES | MUL | NULL | | | upcCode | varchar(320) | YES | MUL | NULL | | | name | varchar(320) | NO | | NULL | | | description | text | NO | | NULL | | | keywords | text | YES | | NULL | | | disqusThreadID | varchar(50) | NO | | NULL | | | tags | text | YES | | NULL | | | createdOn | int(10) unsigned | NO | | NULL | | | lastUpdated | int(10) unsigned | NO | | NULL | | | imageURL | varchar(320) | YES | | NULL | | | inStock | tinyint(1) | YES | MUL | 1 | | | active | tinyint(1) | YES | | 1 | | ++--+--+-+-++ 13 rows in set (0.00 sec) mysql describe product_soldby_vendor; +-+--+--+-+-+---+ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-+---+ | productID | int(10) unsigned | NO | MUL | NULL | | | productVendorID | int(10) unsigned | NO | MUL | NULL | | | price | double | NO | | NULL | | | currency | varchar(5) | NO | | NULL | | | buyURL | varchar(320) | NO | | NULL | | +-+--+--+-+-+---+ 5 rows in set (0.00 sec) mysql describe products_vendors_subcategories; ++--+--+-+-++ | Field | Type | Null | Key | Default | Extra | ++--+--+-+-++ | productVendorSubcategoryID | int(10) unsigned | NO | PRI | NULL | auto_increment | | productVendorCategoryID | int(10) unsigned | NO | | NULL | | | labelEnglish | varchar(320) | NO | | NULL | | | labelFrench | varchar(320) | NO | | NULL | | ++--+--+-+-++ 4 rows in set (0.00 sec) mysql describe products_vendors_categories; +-+--+--+-+-++ | Field | Type | Null | Key | Default | Extra | +-+--+--+-+-++ | productVendorCategoryID | int(10) unsigned | NO | PRI | NULL | auto_increment | | labelEnglish | varchar(320) | NO | | NULL | | | labelFrench | varchar(320) | NO | | NULL | | +-+--+--+-+-++ 3 rows in set (0.00 sec) mysql describe product_vendor_in_subcategory; +---+--+--+-+-+---+ | Field | Type | Null | Key | Default |
Re: DataImportHandler TikaEntityProcessor FieldReaderDataSource
unfortunately, no On Fri, Feb 5, 2010 at 2:23 PM, Jorg Heymans jorg.heym...@gmail.com wrote: dow, thanks for that Paul :-| I suppose schema validation for data-config.xml is already in Jira somewhere ? Jorg 2010/2/5 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com wrong datasource name=orablob type=FieldStreamDataSource / right dataSource name=orablob type=FieldStreamDataSource / On Thu, Feb 4, 2010 at 9:27 PM, Jorg Heymans jorg.heym...@gmail.com wrote: Hi, I'm having some troubles getting this to work on a snapshot from 3rd feb My config looks as follows dataSource name=ora driver=oracle.jdbc.OracleDriver url= / datasource name=orablob type=FieldStreamDataSource / document name=mydoc entity dataSource=ora name=meta query=select id, filename, bytes from documents field column=ID name=id / field column=FILENAME name=filename / entity dataSource=orablob processor=TikaEntityProcessor url=bytes dataField=meta.BYTES field column=text name=mainDocument/ /entity /entity /document and i get this stacktrace org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: bytes Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39) at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98) It seems that whatever is in the url attribute it is trying to execute as a query. So i thought i put url=select bytes from documents where id = ${meta.ID} but then i get a classcastexception. Caused by: java.lang.ClassCastException: org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1 at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:98) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:233) Any ideas what is wrong with the config ? Thanks Jorg 2010/1/27 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com There is no corresponding DataSurce which can be used with TikaEntityProcessor which reads from BLOB I have opened an issue.https://issues.apache.org/jira/browse/SOLR-1737 On Mon, Jan 25, 2010 at 10:57 PM, Shah, Nirmal ns...@columnit.com wrote: Hi, I am fairly new to Solr and would like to use the DIH to pull rich text files (pdfs, etc) from BLOB fields in my database. There was a suggestion made to use the FieldReaderDataSource with the recently commited TikaEntityProcessor. Has anyone accomplished this? This is my configuration, and the resulting error - I'm not sure if I'm using the FieldReaderDataSource correctly. If anyone could shed light on whether I am going the right direction or not, it would be appreciated. ---Data-config.xml: dataConfig datasource name=f1 type=FieldReaderDataSource / dataSource name=orcle driver=oracle.jdbc.driver.OracleDriver url=jdbc:oracle:thin:un/p...@host:1521:sid / document entity dataSource=orcle name=attach query=select id as name, attachment from testtable2 entity dataSource=f1 processor=TikaEntityProcessor dataField=attach.attachment format=text field column=text name=NAME / /entity /entity /document /dataConfig -Debug error: response lst name=responseHeader int name=status0/int int name=QTime203/int /lst lst name=initArgs lst name=defaults str name=configtestdb-data-config.xml/str /lst /lst str name=commandfull-import/str str name=modedebug/str null name=documents/ lst name=verbose-output lst name=entity:attach lst name=document#1 str name=queryselect id as name, attachment from testtable2/str str name=time-taken0:0:0.32/str str--- row #1-/str str name=NAMEjava.math.BigDecimal:2/str str name=ATTACHMENToracle.sql.BLOB:oracle.sql.b...@1c8e807/str str-/str lst name=entity:253433571801723 str name=EXCEPTION org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource :f1 available for entity :253433571801723 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImporter.getDataSourceInstance(Da
Re: DataImportHandlerException for custom DIH Transformer
On Mon, Feb 8, 2010 at 9:13 AM, Tommy Chheng tommy.chh...@gmail.com wrote: I'm having trouble making a custom DIH transformer in solr 1.4. I compiled the General TrimTransformer into a jar. (just copy/paste sample code from http://wiki.apache.org/solr/DIHCustomTransformer) I placed the jar along with the dataimporthandler jar in solr/lib (same directory as the jetty jar) do not keep in solr/lib it wont work. keep it in {solr.home}/lib Then I added to my DIH data-config.xml file: transformer=DateFormatTransformer, RegexTransformer, com.chheng.dih.transformers.TrimTransformer Now I get this exception when I try running the import. org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodException: com.chheng.dih.transformers.TrimTransformer.transformRow(java.util.Map) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.loadTransformers(EntityProcessorWrapper.java:120) I noticed the exception lists TrimTransformer.transformRow(java.util.Map) but the abstract Transformer class defines a two parameter method: transformRow(MapString, Object row, Context context)? -- Tommy Chheng Programmer and UC Irvine Graduate Student Twitter @tommychheng http://tommy.chheng.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: How to configure multiple data import types
are you referring to nested entities? http://wiki.apache.org/solr/DIHQuickStart#Index_data_from_multiple_tables_into_Solr On Mon, Feb 8, 2010 at 5:42 PM, stefan.ma...@bt.com wrote: I have got a dataimport request handler configured to index data by selecting data from a DB view I now need to index additional data sets from other views so that I can support other search queries I defined additional entity .. definitions within the document .. section of my data-config.xml But I only seem to pull in data for the 1st entity .. and not both Is there an xsd (or dtd) for data-config.xml schema.xml slrconfig.xml As these might help with understanding how to construct usable conf files Regards Stefan Maric BT Innovate Design | Collaboration Platform - Customer Innovation Solutions -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DIH: delta-import not working
try this deltaImportQuery=select id, bytes from attachment where application = 'MYAPP' and id = '${dataimporter.delta.id}' be aware that the names are case sensitive . if the id comes as 'ID' this will not work On Tue, Feb 9, 2010 at 3:15 PM, Jorg Heymans jorg.heym...@gmail.com wrote: Hi, I am having problems getting the delta-import to work for my schema. Following what i have found in the list, jira and the wiki below configuration should just work but it doesn't. dataConfig dataSource name=ora driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@. user= password=/ dataSource name=orablob type=FieldStreamDataSource / document name=mydocuments entity dataSource=ora name=attachment pk=id query=select id, bytes from attachment where application = 'MYAPP' deltaImportQuery=select id, bytes from attachment where application = 'MYAPP' and id = '${dataimporter.attachment.id}' deltaQuery=select id from attachment where application = 'MYAPP' and modified_on gt; to_date('${dataimporter.attachment.last_index_time}', '-mm-dd hh24:mi:ss') field column=id name=attachmentId / entity dataSource=orablob processor=TikaEntityProcessor url=bytes dataField=attachment.bytes field column=text name=attachmentContents/ /entity /entity /document /dataConfig The sql generated in the deltaquery is correct, the timestamp is passed correctly. When i execute that query manually in the DB it returns the pk of the rows that were added. However no documents are added to the index. What am i missing here ?? I'm using a build snapshot from 03/02. Thanks Jorg -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Call URL, simply parse the results using SolrJ
you can also try URL urlo = new URL(url);// ensure that the url has wt=javabin in that NamedListObject namedList = new JavaBinCodec().unmarshal(urlo.openConnection().getInputStream()); QueryResponse response = new QueryResponse(namedList, null); On Mon, Feb 8, 2010 at 11:49 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Here's what I did to resolve this: XMLResponseParser parser = new XMLResponseParser(); URL urlo = new URL(url); InputStreamReader isr = new InputStreamReader(urlo.openConnection().getInputStream()); NamedListObject namedList = parser.processResponse(isr); QueryResponse response = new QueryResponse(namedList, null); On Mon, Feb 8, 2010 at 10:03 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: So here's what happens if I pass in a URL with parameters, SolrJ chokes: Exception in thread main java.lang.RuntimeException: Invalid base url for solrj. The base URL must not contain parameters: http://locahost:8080/solr/main/select?q=videoqt=dismax at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:205) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:180) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.init(CommonsHttpSolrServer.java:152) at org.apache.solr.util.QueryTime.main(QueryTime.java:20) On Mon, Feb 8, 2010 at 9:32 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Sorry for the poorly worded title... For SOLR-1761 I want to pass in a URL and parse the query response... However it's non-obvious to me how to do this using the SolrJ API, hence asking the experts here. :) -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr 1.4: Full import FileNotFoundException
concurrent imports are not allowed in DIH, unless u setup multiple DIH instances On Sat, Feb 13, 2010 at 7:05 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I have noticed that when I run concurrent full-imports using DIH in Solr : 1.4, the index ends up getting corrupted. I see the following in the log I'm fairly confident that concurrent imports won't work -- but it shouldn't corrupt your index -- even if the DIH didn't actively check for this type of situation, the underlying Lucene LockFactory should ensure that one of the inports wins ... you'll need to tell us what kind of Filesystem you are using, and show us the relevent settings from your solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, etc...) At worst you should get a lock time out exception. : But I looked at: : http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html : : and was under the impression that this issue was fixed in Solr 1.4. ...right, attempting to run two concurrent imports with DIH should cause the second one to abort immediatley. -Hoss -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr 1.4: Full import FileNotFoundException
can we confirm that the user does not have multiple DIH configured? any request for an import, while an import is going on, is rejected On Sat, Feb 13, 2010 at 11:40 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : concurrent imports are not allowed in DIH, unless u setup multiple DIH instances Right, but that's not the issue -- the question is wether attemping to do so might be causing index corruption (either because of a bug or because of some possibly really odd config we currently know nothing about) : : I have noticed that when I run concurrent full-imports using DIH in Solr : : 1.4, the index ends up getting corrupted. I see the following in the log : : I'm fairly confident that concurrent imports won't work -- but it : shouldn't corrupt your index -- even if the DIH didn't actively check for : this type of situation, the underlying Lucene LockFactory should ensure : that one of the inports wins ... you'll need to tell us what kind of : Filesystem you are using, and show us the relevent settings from your : solrconfig (lock type, merge policy, indexDefaults, mainIndex, DIH, : etc...) : : At worst you should get a lock time out exception. : : : But I looked at: : : http://old.nabble.com/dataimporthandler-and-multiple-delta-import-td19160129.html : : : : and was under the impression that this issue was fixed in Solr 1.4. : : ...right, attempting to run two concurrent imports with DIH should cause : the second one to abort immediatley. : : : : : -Hoss : : : : : : -- : - : Noble Paul | Systems Architect| AOL | http://aol.com : -Hoss -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Preventing mass index delete via DataImportHandler full-import
On Wed, Feb 17, 2010 at 8:03 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I have a small worry though. When I call the full-import functions, can : I configure Solr (via the XML files) to make sure there are rows to : index before wiping everything? What worries me is if, for some unknown : reason, we have an empty database, then the full-import will just wipe : the live index and the search will be broken. I believe if you set clear=false when doing the full-import, DIH won't it is clean=false or use command=import instead of command=full-import delete the entire index before it starts. it probably makes the full-import slower (most of the adds wind up being deletes followed by adds) but it should prevent you from having an empty index if something goes wrong with your DB. the big catch is you now have to be responsible for managing deletes (using the XmlUpdateRequestHandler) yourself ... this bug looks like it's goal is to make this easier to deal with (but i'd not really clear to me what deletedPkQuery is ... it doesnt' seem to be documented. https://issues.apache.org/jira/browse/SOLR-1168 -Hoss -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: @Field annotation support
solrj jar On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello All, When I use Maven or Eclipse to try and compile my bean which has the @Field annotation as specified in http://wiki.apache.org/solr/Solrj page ... the compiler doesn't find any class to support the annotation. What jar should we use to bring in this custom Solr annotation? -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: replications issue
wha is the problem. Is the replication not happening after you do a commit on the master? frequent polling is not a problem. frequent commits can slow down the system On Fri, Feb 19, 2010 at 2:41 PM, giskard gisk...@autistici.org wrote: Ciao, Uhm after some time a new index in data/index on the slave has been written with the ~size of the master index. the configure on both master slave is the same one on the solrReplication wiki page enable/disable master/slave in a node requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=confFilesschema.xml,stopwords.txt/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://localhost:8983/solr/replication/str str name=pollInterval00:00:60/str /lst /requestHandler When the master is started, pass in -Denable.master=true and in the slave pass in -Denable.slave=true. Alternately , these values can be stored in a solrcore.properties file as follows #solrcore.properties in master enable.master=true enable.slave=false Il giorno 19/feb/2010, alle ore 03.43, Otis Gospodnetic ha scritto: giskard, Is this on the master or on the slave(s)? Maybe you can paste your replication handler config for the master and your replication handler config for the slave. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Hadoop ecosystem search :: http://search-hadoop.com/ From: giskard gisk...@autistici.org To: solr-user@lucene.apache.org Sent: Thu, February 18, 2010 12:16:37 PM Subject: replications issue Hi all, I've setup solr replication as described in the wiki. when i start the replication a directory called index.$numebers is created after a while it disappears and a new index.$othernumbers is created index/ remains untouched with an empty index. any clue? thank you in advance, Riccardo -- ciao, giskard -- ciao, giskard -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: @Field annotation support
On Fri, Feb 19, 2010 at 11:41 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Ok then, is this the correct class to support the @Field annotation? Because I have it on the path but its not working. yes , it is the right class. But, what is not working? org\apache\solr\solr-solrj\1.4.0\solr-solrj-1.4.0.jar/org\apache\solr\client\solrj\beans\Field.class 2010/2/18 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: solrj jar On Thu, Feb 18, 2010 at 10:52 PM, Pulkit Singhal pulkitsing...@gmail.com wrote: Hello All, When I use Maven or Eclipse to try and compile my bean which has the @Field annotation as specified in http://wiki.apache.org/solr/Solrj page ... the compiler doesn't find any class to support the annotation. What jar should we use to bring in this custom Solr annotation? -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Using XSLT with DIH for a URLDataSource
The xslt file looks fine . is the location of the file correct ? On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes r...@alpha-solutions.dk wrote: Hi (thanks a lot) Yes, The full stacktrace is this: 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Error initializing XSL Processing Document # 1 at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:103) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) Caused by: javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:825) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:614) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:98) ... 24 more 22-02-2010 08:37:00 org.apache.solr.update.DirectUpdateHandler2 rollback My import feed (for testing is this): ?xml version='1.0' encoding='utf-8'? products product id='738' rank='10' brand id='48'![CDATA[World's Best]]/brandname![CDATA[Kontakt Cream-Special 4 x 10]]/name categories primarycategory='17' category id='7' name![CDATA[Jeans Bukser]]/name category id='17' name![CDATA[Jeans]]/name /category /category category id='8' name![CDATA[Nyheder]]/name /category /categories description![CDATA[4 pakker med 10 stk. glatte kondomer, med reservoir og creme.]]/descriptionprice currency='SEK'310.70/pricesalesprice currency='SEK'233.03/salespricecolor id='227'![CDATA[4 x 10 kondomer]]/colorsize id='6'![CDATA[Large]]/sizeproductUrl![CDATA[http://www.website.se/butik/visvare.asp?id=738]]/productUrlimageUrl![CDATA[http://www.website.se/varebilleder/738_intro.jpg]]/imageUrllastmodified11-11-2008 15:10:31/lastmodified/product product id='320' rank='10' categories primarycategory='17' category id='7' name![CDATA[Jeans Bukser]]/name category id='17' name![CDATA[Jeans]]/name /category /category category id='8' name![CDATA[Nyheder]]/name /category /categories brand id='1'![CDATA[JBS]]/brandname![CDATA[JBS trusser]]/namecategory id='39'![CDATA[Trusser]]/categorydescription![CDATA[Gråmeleret JBS trusser model Classic med gylp.]]/descriptionprice currency='SEK'154.96/pricesalesprice currency='SEK'154.96/salespricecolor id='28'![CDATA[Gråmeleret]]/colorsize
Re: error while using the DIH handler
can you paste the DIH part in your solrconfig.xml ? On Tue, Feb 23, 2010 at 7:01 PM, Na_D nabam...@zaloni.com wrote: yes i did check the location of the data-config.xml its in the folder example-DIH/solr/db/conf -- View this message in context: http://old.nabble.com/error-while-using-the-DIH-handler-tp27702772p2770.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Using XSLT with DIH for a URLDataSource
you are right. The StreamSource class is not throwing the proper exception Do we really have to handle this.? On Thu, Feb 25, 2010 at 9:06 AM, Lance Norskog goks...@gmail.com wrote: [Taken off the list] The problem is that the XSLT code swallows the real exception, and does not return it as the deeper exception. To show the right error, the code would open a file name or an URL directly. The problem is, the code has to throw an exception on a file or an URL and try the other, then decide what to do. try { URL u = new URL(xslt); iStream = u.openStream(); } catch (MalformedURLException e) { iStream = new FileInputStream(new File(xslt)); } TransformerFactory transFact = TransformerFactory.newInstance(); xslTransformer = transFact.newTransformer(new StreamSource(iStream)); On Mon, Feb 22, 2010 at 6:24 AM, Roland Villemoes r...@alpha-solutions.dk wrote: You're right! I was as simple (stupid!) as that, Thanks a lot (for your time .. very appreciated) Roland -Oprindelig meddelelse- Fra: noble.p...@gmail.com [mailto:noble.p...@gmail.com] På vegne af Noble Paul ??? ?? Sendt: 22. februar 2010 14:01 Til: solr-user@lucene.apache.org Emne: Re: Using XSLT with DIH for a URLDataSource The xslt file looks fine . is the location of the file correct ? On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes r...@alpha-solutions.dk wrote: Hi (thanks a lot) Yes, The full stacktrace is this: 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Error initializing XSL Processing Document # 1 at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:103) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) Caused by: javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:825) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:614) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:98) ... 24 more 22-02-2010 08:37:00 org.apache.solr.update.DirectUpdateHandler2 rollback My import feed (for testing is this): ?xml version='1.0' encoding='utf-8'? products product id='738' rank='10' brand id='48'![CDATA[World's Best]]/brandname![CDATA[Kontakt Cream-Special 4 x 10]]/name categories primarycategory='17' category id='7' name![CDATA[Jeans Bukser]]/name category id='17'
Re: Using XSLT with DIH for a URLDataSource
this is the only one place this should be a problem.'xsl' is not a very commonly used attribute On Fri, Feb 26, 2010 at 10:46 AM, Lance Norskog goks...@gmail.com wrote: There could be a common 'open an url' utility method. This would help make the DIH components consistent. 2010/2/24 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: you are right. The StreamSource class is not throwing the proper exception Do we really have to handle this.? On Thu, Feb 25, 2010 at 9:06 AM, Lance Norskog goks...@gmail.com wrote: [Taken off the list] The problem is that the XSLT code swallows the real exception, and does not return it as the deeper exception. To show the right error, the code would open a file name or an URL directly. The problem is, the code has to throw an exception on a file or an URL and try the other, then decide what to do. try { URL u = new URL(xslt); iStream = u.openStream(); } catch (MalformedURLException e) { iStream = new FileInputStream(new File(xslt)); } TransformerFactory transFact = TransformerFactory.newInstance(); xslTransformer = transFact.newTransformer(new StreamSource(iStream)); On Mon, Feb 22, 2010 at 6:24 AM, Roland Villemoes r...@alpha-solutions.dk wrote: You're right! I was as simple (stupid!) as that, Thanks a lot (for your time .. very appreciated) Roland -Oprindelig meddelelse- Fra: noble.p...@gmail.com [mailto:noble.p...@gmail.com] På vegne af Noble Paul ??? ?? Sendt: 22. februar 2010 14:01 Til: solr-user@lucene.apache.org Emne: Re: Using XSLT with DIH for a URLDataSource The xslt file looks fine . is the location of the file correct ? On Mon, Feb 22, 2010 at 2:57 PM, Roland Villemoes r...@alpha-solutions.dk wrote: Hi (thanks a lot) Yes, The full stacktrace is this: 22-02-2010 08:37:00 org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Error initializing XSL Processing Document # 1 at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:103) at org.apache.solr.handler.dataimport.XPathEntityProcessor.init(XPathEntityProcessor.java:76) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.init(EntityProcessorWrapper.java:71) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:319) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:619) Caused by: javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:825) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:614) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initXpathReader(XPathEntityProcessor.java:98) ... 24 more 22-02-2010 08:37:00
Re: If you could have one feature in Solr...
On Wed, Feb 24, 2010 at 7:18 PM, Patrick Sauts patrick.via...@gmail.com wrote: Synchronisation between the slaves to switch the new index at the same time after replication. I shall open as issue for this. And let us figure out how best it should be done https://issues.apache.org/jira/browse/SOLR-1800
Re: replication issue
The data/index.20100226063400 dir is a temporary dir and isc reated in the same dir where the index dir is located. I'm wondering if the symlink is causing the problem. Why don't you set the data dir as /raid/data instead of /solr/data On Sat, Feb 27, 2010 at 12:13 AM, Matthieu Labour matthieu_lab...@yahoo.com wrote: Hi I am still having issues with the replication and wonder if things are working properly So I have 1 master and 1 slave On the slave, I deleted the data/index directory and data/replication.properties file and restarted solr. When slave is pulling data from master, I can see that the size of data directory is growing r...@slr8:/raid/data# du -sh 3.7M . r...@slr8:/raid/data# du -sh 4.7M . and I can see that data/replication.properties file got created and also a directory data/index.20100226063400 soon after index.20100226063400 disapears and the size of data/index is back to 12K r...@slr8:/raid/data/index# du -sh 12K . And when I look for the number of documents via the admin interface, I still see 0 documents so I feel something is wrong One more thing, I have a symlink for /solr/data --- /raid/data Thank you for your help ! matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: If you could have one feature in Solr...
On Fri, Mar 5, 2010 at 4:34 AM, Mark Miller markrmil...@gmail.com wrote: On 03/04/2010 05:56 PM, Chris Hostetter wrote: : The ability to read solr configuration files from the classpath instead of : solr.solr.home directory. Solr has always supported this. When SolrResourceLoader.openResourceLoader is asked to open a resource it first checks if it's an absolute path -- if it's not then it checks relative the conf dir (under whatever the instanceDir is, ie: Solr Home in a single core setup), then it checks relative the current working dir and if it still can't find it it checks via the current ClassLoader. that said: it's not something that a lot of people have ever taken advantage of, so it wouldn't suprise me if some features in Solr are buggy because they try to open files directly w/o utilizing openResourceLoader -- in particular a quick test of the trunk example using... java -Djetty.class.path=./solr/conf -Dsolr.solr.home=/tmp/new-solr-home -jar start.jar ...seems to suggest that QueryElevationComponent isn't using openResource to look for elevate.xml (i set solr.solr.home in that line so solr would *NOT* attempt to look at ./solr ... it does need some sort of Solr Home, but in this case it was a completley empty directory) -Hoss I've been trying to think of ways to tackle this. I hate getConfigDir - it lets anyone just get around the ResourceLoader basically. It would be awesome to get rid of it somehow - it would make ZooKeeperSolrResourceLoader so much easier to get working correctly across the board. Why not just get rid of it? Components depending on filesystems is a big headache. The main thing I'm hung up on is how to update a file - some code I've seen uses getConfigDir to update files eg you get the content of solrconfig, then you want to update it and reload the core. Most other things, I think are doable without getConfigDir. QueryElevationComponent is actually sort of simple to get around - we just need to add an exists method that return true/false if the resource exists. QEC just uses getConfigDir to a do an exists on the elevate.xml - if its not there, it looks in the data dir. -- - Mark http://www.lucidimagination.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Is it possible to use ODBC with DIH?
if you have a jdbc-odbc bridge driver , it should be fine On Sun, Mar 7, 2010 at 4:52 AM, JavaGuy84 bbar...@gmail.com wrote: Hi, I have a ODBC driver with me for MetaMatrix DB(Redhat). I am trying to figure out a way to use DIH using the DSN which has been created in my machine with that ODBC driver? Is it possible to spcify a DSN in DIH and index the DB? if its possible, can you please let me know the ODBC URL that I need to enter for Datasource in DIH data-config.xml? Thanks, Barani -- View this message in context: http://old.nabble.com/Is-it-possible-to-use-ODBC-with-DIH--tp27808016p27808016.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: XPath Processing Applied to Clob
keep in mind that the xpath is case-sensitive. paste a sample xml what is dataField=d.text it does not seem to refer to anything. where is the enclosing entity? did you mean dataField=doc.text. xpath=//BODY is a supported syntax as long as you are using Solr1.4 or higher On Thu, Mar 18, 2010 at 3:15 AM, Neil Chaudhuri nchaudh...@potomacfusion.com wrote: Incidentally, I tried adding this: datasource name=f type=FieldReaderDataSource / document entity dataSource=f processor=XPathEntityProcessor dataField=d.text forEach=/MESSAGE field column=body xpath=//BODY/ /entity /document But this didn't seem to change anything. Any insight is appreciated. Thanks. From: Neil Chaudhuri Sent: Wednesday, March 17, 2010 3:24 PM To: solr-user@lucene.apache.org Subject: XPath Processing Applied to Clob I am using the DataImportHandler to index 3 fields in a table: an id, a date, and the text of a document. This is an Oracle database, and the document is an XML document stored as Oracle's xmltype data type. Since this is nothing more than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. However, I don't want to index/store all the XML but instead just the XML within a set of tags. The XPath itself is trivial, but it seems like the XPathEntityProcessor only works for XML file content rather than the output of a Transformer. Here is what I currently have that fails: document entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer field column=EFFECTIVE_DT name=effectiveDate / field column=ARCHIVE_ID name=id / field column=TEXT name=text clob=true entity name=text processor=XPathEntityProcessor forEach=/MESSAGE url=${doc.text} field column=body xpath=//BODY/ /entity /entity /document Is there an easy way to do this without writing my own custom transformer? Thanks. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: DIH best pratices question
On Sat, Mar 27, 2010 at 3:25 AM, Blargy zman...@hotmail.com wrote: I have a items table on db1 and and item_descriptions table on db2. The items table is very small in the sense that it has small columns while the item_descriptions table has a very large text field column. Both tables are around 7 million rows What is the best way to import these into one document? document entity name=item ... entity name=item_descriptions ... /entity /entity /document this is the right way Or document entity name=item_descriptions rootEntity=false entity name=item ... /entity /entity /document Or is there an alternative way? Maybe using the second way with a CachedSqlEntityProcessor for the item entity? I don't think CachedSqlEntityProcessor helps here. Any thoughts are greatly appreciated. Thanks! -- View this message in context: http://n3.nabble.com/DIH-best-pratices-question-tp677568p677568.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: expungeDeletes on commit in Dataimport
On Thu, Mar 25, 2010 at 10:14 PM, Ruben Chadien ruben.chad...@aspiro.com wrote: Hi I know this has been discussed before, but is there any way do expungeDeletes=true when the DataImportHandler does the commit. expungeDeletes= true is not used does not mean that the doc does not get deleted.deleteDocByQuery does not do a commit. if you wish to commit you should do it explicitly I am using the deleteDocByQuery in a Transformer when doing a delta-import and as discussed before the documents are not deleted until restart. Also, how do i know in a Transformer if its running a Delta or Full Import , i tries looking at Context. currentProcess() but that gives me FULL_DUMP when doing a delta import...? the variable ${dataimporter.request.command} tells you which command is being run Thanks! Ruben Chadien -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: ReplicationHandler reports incorrect replication failures
please create a bug On Fri, Mar 26, 2010 at 7:29 PM, Shawn Smith ssmit...@gmail.com wrote: We're using Solr 1.4 Java replication, which seems to be working nicely. While writing production monitors to check that replication is healthy, I think we've run into a bug in the status reporting of the ../solr/replication?command=details command. (I know it's experimental...) Our monitor parses the replication?command=details XML and checks that replication lag is reasonable by diffing the indexVersion of the master and slave indices to make sure it's within a reasonable time range. Our monitor also compares the first elements of indexReplicatedAtList and replicationFailedAtList lists to see if the last replication attempt failed. This is where we're having a problem with the monitor throwing false errors. It looks like there's a bug that causes successful replications to be considered failures. The bug is triggered immediately after a slave restarts when the slave is already in sync with the master. Each no-op replication attempt after restart is considered a failure until something on the master changes and replication has to actually do work. From the code, it looks like SnapPuller.successfulInstall starts out false on restart. If the slave starts out in sync with the master, then each no-op replication poll leaves successfulInstall set to false which makes SnapPuller.logReplicationTimeAndConfFiles log the poll as a failure. SnapPuller.successfulInstall stays false until the first time replication actually has to do something, at which point it gets set to true, and then everything is OK. Thanks, Shawn -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Solr indexing not taking all values from DB.
The DIH status says 10 rows which means only 10 rows got fetched for that query. Do you have any custom transformers which eats up rows? Try the debug page of DIH and see what is happening to the rest of the rows. On Fri, Oct 10, 2008 at 5:32 PM, con [EMAIL PROTECTED] wrote: A simple question: I performed the following steps to index data from a oracle db to solr index and then search: a) I have the configurations for indexing data from a oracle db b) started the server. c) Done a full-import: http://localhost:8983/solr/dataimport?command=full-import But when I do a search using http://localhost:8983/solr/select/?q= Not all the result sets that matches the search string are displayed. 1) Is the above steps enough for getting db values to solr index? My configurations (data-config.xml and schema.xml )are quite correct because I am getting SOME of the result sets as search result(not all). 2) Is there some value in sorconfig.xml, or some other files that limits the number of items being indexed? [For the time being I have only a few hundreds of records in my db. ] The query that I am specifying in data-config yields around 25 results if i execute it in a oracle client, where as the status of full-import is something like: str name=statusidle/str str name=importResponseConfiguration Re-loaded sucessfully/str lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched10/str str name=Total Documents Skipped0/str str name=Full Dump Started2008-10-10 17:29:03/str str name=Time taken 0:0:0.513/str /lst -- View this message in context: http://www.nabble.com/Solr-indexing-not-taking-all-values-from-DB.-tp19916938p19916938.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: Solr indexing not taking all values from DB.
template transformer does not eat up rows. I am almost sure that the query returns only 10 rows in that case. could you write a quick jdbc program and verify that (not the oralce client) everything else looks fine On Sat, Oct 11, 2008 at 4:52 PM, con [EMAIL PROTECTED] wrote: Hi Noble Thanks for your reply In my data-config.xml I have; entity name=employees transformer=TemplateTransformer query=Select EMP_ID , EMP_NAME , NVL (COMMENT ,'-Nil-') as COMMENT from EMPLOYEES field column=rowtype template=employees / field column=EMP_ID name=EMP_ID / field column=EMP_NAME name=EMP_NAME / field column=COMMENT name=COMMENT / /entity entity name=customers transformer=TemplateTransformer query=Select CUST_ID , CUST_NAME , NVL (COMMENT ,'-Nil-') as COMMENT from CUSTOMERS field column=rowtype template=customers / field column=CUST_ID name=CUST_ID / field column=CUST_NAME name=CUST_NAME / field column=COMMENT name=COMMENT / /entity Whether this, TemplateTransformer, is the one that is restricting the resultset count to 10? Where can I find it out? I need this TemplateTransformer because I want to query the responses of either one of these at a time using the URL like, http://localhost:8983/solr/select/?q=(Bob%20AND%20rowtype:customers)version=2.2start=0rows=10indent=onwt=json I tried in the debug mode: (http://localhost:8983/solr/dataimport?command=full-importdebug=onverbose=on) , But it is not all mentioning anything after the 10th document. Thanks and regards con Noble Paul നോബിള് नोब्ळ् wrote: The DIH status says 10 rows which means only 10 rows got fetched for that query. Do you have any custom transformers which eats up rows? Try the debug page of DIH and see what is happening to the rest of the rows. On Fri, Oct 10, 2008 at 5:32 PM, con [EMAIL PROTECTED] wrote: A simple question: I performed the following steps to index data from a oracle db to solr index and then search: a) I have the configurations for indexing data from a oracle db b) started the server. c) Done a full-import: http://localhost:8983/solr/dataimport?command=full-import But when I do a search using http://localhost:8983/solr/select/?q= Not all the result sets that matches the search string are displayed. 1) Is the above steps enough for getting db values to solr index? My configurations (data-config.xml and schema.xml )are quite correct because I am getting SOME of the result sets as search result(not all). 2) Is there some value in sorconfig.xml, or some other files that limits the number of items being indexed? [For the time being I have only a few hundreds of records in my db. ] The query that I am specifying in data-config yields around 25 results if i execute it in a oracle client, where as the status of full-import is something like: str name=statusidle/str str name=importResponseConfiguration Re-loaded sucessfully/str lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched10/str str name=Total Documents Skipped0/str str name=Full Dump Started2008-10-10 17:29:03/str str name=Time taken 0:0:0.513/str /lst -- View this message in context: http://www.nabble.com/Solr-indexing-not-taking-all-values-from-DB.-tp19916938p19916938.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Solr-indexing-not-taking-all-values-from-DB.-tp19916938p19931736.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: Solr indexing not taking all values from DB.
in debug mode it writes only 10 because there is a rows parameter which is by default set to 10 make it 100 or so and you should be seeing all docs. But in non-debug mode there is no such parameter On Sun, Oct 12, 2008 at 11:00 PM, con [EMAIL PROTECTED] wrote: I wrote a jdbc program to implement the same query. But it is returning all the responses, 25 nos. But the solr is still indexing only 10 rows. Is there any optimization settings by default in the solrconfig.xml that restricts the responses to 10 ? thanks con. Noble Paul നോബിള് नोब्ळ् wrote: template transformer does not eat up rows. I am almost sure that the query returns only 10 rows in that case. could you write a quick jdbc program and verify that (not the oralce client) everything else looks fine On Sat, Oct 11, 2008 at 4:52 PM, con [EMAIL PROTECTED] wrote: Hi Noble Thanks for your reply In my data-config.xml I have; entity name=employees transformer=TemplateTransformer query=Select EMP_ID , EMP_NAME , NVL (COMMENT ,'-Nil-') as COMMENT from EMPLOYEES field column=rowtype template=employees / field column=EMP_ID name=EMP_ID / field column=EMP_NAME name=EMP_NAME / field column=COMMENT name=COMMENT / /entity entity name=customers transformer=TemplateTransformer query=Select CUST_ID , CUST_NAME , NVL (COMMENT ,'-Nil-') as COMMENT from CUSTOMERS field column=rowtype template=customers / field column=CUST_ID name=CUST_ID / field column=CUST_NAME name=CUST_NAME / field column=COMMENT name=COMMENT / /entity Whether this, TemplateTransformer, is the one that is restricting the resultset count to 10? Where can I find it out? I need this TemplateTransformer because I want to query the responses of either one of these at a time using the URL like, http://localhost:8983/solr/select/?q=(Bob%20AND%20rowtype:customers)version=2.2start=0rows=10indent=onwt=json I tried in the debug mode: (http://localhost:8983/solr/dataimport?command=full-importdebug=onverbose=on) , But it is not all mentioning anything after the 10th document. Thanks and regards con Noble Paul നോബിള് नोब्ळ् wrote: The DIH status says 10 rows which means only 10 rows got fetched for that query. Do you have any custom transformers which eats up rows? Try the debug page of DIH and see what is happening to the rest of the rows. On Fri, Oct 10, 2008 at 5:32 PM, con [EMAIL PROTECTED] wrote: A simple question: I performed the following steps to index data from a oracle db to solr index and then search: a) I have the configurations for indexing data from a oracle db b) started the server. c) Done a full-import: http://localhost:8983/solr/dataimport?command=full-import But when I do a search using http://localhost:8983/solr/select/?q= Not all the result sets that matches the search string are displayed. 1) Is the above steps enough for getting db values to solr index? My configurations (data-config.xml and schema.xml )are quite correct because I am getting SOME of the result sets as search result(not all). 2) Is there some value in sorconfig.xml, or some other files that limits the number of items being indexed? [For the time being I have only a few hundreds of records in my db. ] The query that I am specifying in data-config yields around 25 results if i execute it in a oracle client, where as the status of full-import is something like: str name=statusidle/str str name=importResponseConfiguration Re-loaded sucessfully/str lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched10/str str name=Total Documents Skipped0/str str name=Full Dump Started2008-10-10 17:29:03/str str name=Time taken 0:0:0.513/str /lst -- View this message in context: http://www.nabble.com/Solr-indexing-not-taking-all-values-from-DB.-tp19916938p19916938.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Solr-indexing-not-taking-all-values-from-DB.-tp19916938p19931736.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Solr-indexing-not-taking-all-values-from-DB.-tp19916938p19943817.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: Solr indexing not taking all values from DB.
now just do a normal full-import do not enable debug . I guess it should be just fine On Mon, Oct 13, 2008 at 1:20 PM, con [EMAIL PROTECTED] wrote: Thanks Nobble I tried in the debug mode with rows=100 and it is accepting all the result sets. So i suppose there is nothing wrong in the query. But I am not able to update the index since this is available only in the debug mode. Can you please give some suggestions based on this. thanks con Noble Paul നോബിള് नोब्ळ् wrote: in debug mode it writes only 10 because there is a rows parameter which is by default set to 10 make it 100 or so and you should be seeing all docs. But in non-debug mode there is no such parameter On Sun, Oct 12, 2008 at 11:00 PM, con [EMAIL PROTECTED] wrote: I wrote a jdbc program to implement the same query. But it is returning all the responses, 25 nos. But the solr is still indexing only 10 rows. Is there any optimization settings by default in the solrconfig.xml that restricts the responses to 10 ? thanks con. Noble Paul നോബിള് नोब्ळ् wrote: template transformer does not eat up rows. I am almost sure that the query returns only 10 rows in that case. could you write a quick jdbc program and verify that (not the oralce client) everything else looks fine On Sat, Oct 11, 2008 at 4:52 PM, con [EMAIL PROTECTED] wrote: Hi Noble Thanks for your reply In my data-config.xml I have; entity name=employees transformer=TemplateTransformer query=Select EMP_ID , EMP_NAME , NVL (COMMENT ,'-Nil-') as COMMENT from EMPLOYEES field column=rowtype template=employees / field column=EMP_ID name=EMP_ID / field column=EMP_NAME name=EMP_NAME / field column=COMMENT name=COMMENT / /entity entity name=customers transformer=TemplateTransformer query=Select CUST_ID , CUST_NAME , NVL (COMMENT ,'-Nil-') as COMMENT from CUSTOMERS field column=rowtype template=customers / field column=CUST_ID name=CUST_ID / field column=CUST_NAME name=CUST_NAME / field column=COMMENT name=COMMENT / /entity Whether this, TemplateTransformer, is the one that is restricting the resultset count to 10? Where can I find it out? I need this TemplateTransformer because I want to query the responses of either one of these at a time using the URL like, http://localhost:8983/solr/select/?q=(Bob%20AND%20rowtype:customers)version=2.2start=0rows=10indent=onwt=json I tried in the debug mode: (http://localhost:8983/solr/dataimport?command=full-importdebug=onverbose=on) , But it is not all mentioning anything after the 10th document. Thanks and regards con Noble Paul നോബിള് नोब्ळ् wrote: The DIH status says 10 rows which means only 10 rows got fetched for that query. Do you have any custom transformers which eats up rows? Try the debug page of DIH and see what is happening to the rest of the rows. On Fri, Oct 10, 2008 at 5:32 PM, con [EMAIL PROTECTED] wrote: A simple question: I performed the following steps to index data from a oracle db to solr index and then search: a) I have the configurations for indexing data from a oracle db b) started the server. c) Done a full-import: http://localhost:8983/solr/dataimport?command=full-import But when I do a search using http://localhost:8983/solr/select/?q= Not all the result sets that matches the search string are displayed. 1) Is the above steps enough for getting db values to solr index? My configurations (data-config.xml and schema.xml )are quite correct because I am getting SOME of the result sets as search result(not all). 2) Is there some value in sorconfig.xml, or some other files that limits the number of items being indexed? [For the time being I have only a few hundreds of records in my db. ] The query that I am specifying in data-config yields around 25 results if i execute it in a oracle client, where as the status of full-import is something like: str name=statusidle/str str name=importResponseConfiguration Re-loaded sucessfully/str lst name=statusMessages str name=Total Requests made to DataSource1/str str name=Total Rows Fetched10/str str name=Total Documents Skipped0/str str name=Full Dump Started2008-10-10 17:29:03/str str name=Time taken 0:0:0.513/str /lst -- View this message in context: http://www.nabble.com/Solr-indexing-not-taking-all-values-from-DB.-tp19916938p19916938.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Solr-indexing-not-taking-all-values-from-DB.-tp19916938p19931736.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Solr-indexing-not-taking-all-values-from-DB.-tp19916938p19943817.html
Re: Need Help, Can I query the index from command line
see an example here http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476 On Tue, Oct 14, 2008 at 9:17 PM, Erik Hatcher [EMAIL PROTECTED] wrote: Solr's new DataImportHandler can index RSS (and Atom should be fine too) feeds. Erik On Oct 14, 2008, at 11:37 AM, msizec wrote: Thank you for your help. I've just realized that Solr could not index pages from the web. I wonder if someone of you guys would know another open source search tool that could do this job : indexing pages (rss, atom feeds) from an urls list ant let me query it from the command line so that I could know, in a script, which feed contains a keyword. Don't know if you will understand what I mean, but I hope so ! Thanks ! -- View this message in context: http://www.nabble.com/Need-Help%2C-Can-I-query-the-index-from-command-line-tp19974279p19975753.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: error with delta import
the query makes my head spin . joining on an sql does not enable you to populate multivalued fields . Otherwise , it is all fine pk attribute is missing in the entity On Tue, Oct 14, 2008 at 6:16 PM, Florian Aumeier [EMAIL PROTECTED] wrote: Noble Paul നോബിള് नोब्ळ् schrieb: apparently you have not specified the deltaQuery attribute in the entity. Check the delta-import section in the wiki http://wiki.apache.org/solr/DataImportHandler or you can share your data-config file and we can take a quick look here is my data-config. I configured both, the deltaQuery and query entity in one data-config. Is this the correct usecase? Also, I found it easier to join the document on the database level instead of leaving it to solr. dataConfig dataSource type=JdbcDataSource driver=org.postgresql.Driver url=jdbc:postgresql://bm02:5432/bm user=user / document name=articles entity name=articles deltaQuery=SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS article_ref,a.id_blogs,a.title AS article_title, a.normalized_text, au.url AS article_url, bu.url AS blog_url, b.title AS blog_title,b.subtitle AS blog_subtitle, r.rank, coalesce(a.updated,a.published,a.added) as ts FROM articles a join blogs b on a.id_blogs = b.id join urls au on a.id_urls = au.id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN ranks r on a.id = r.id_articles WHERE b.id_urls is not null AND a.hidden is false AND b.hidden is false AND a.ref is not null AND b.ref is not null AND (rankid in (SELECT rankid FROM ranks order by rankid desc limit 1) OR rankid is null) AND coalesce(a.updated,a.published,a.added) gt; '${dataimporter.last_index_time}' query=SELECT a.id AS article_id,a.stub AS article_stub,a.ref AS article_ref,a.id_blogs,a.title AS article_title, a.normalized_text, au.url AS article_url, bu.url AS blog_url, b.t\ itle AS blog_title,b.subtitle AS blog_subtitle, r.rank, coalesce(a.updated,a.published,a.added) as ts FROM articles a join blogs b on a.id_blogs = b.id join urls au on a.id_urls = au\ .id join urls bu on b.id_urls = bu.id LEFT OUTER JOIN ranks r on a.id = r.id_articles WHERE b.id_urls is not null AND a.hidden is false AND b.hidden is false AND a.ref is not null AN\ D b.ref is not null AND (rankid in (SELECT rankid FROM ranks order by rankid desc limit 1) OR rankid is null) AND coalesce(a.updated,a.published,a.added) field column=article_id name=a_id / field column=normalized_text name=norm_text / field column=article_ref name=id / field column=article_stub name=stub / field column=id_blogs name=blog_id / field column=article_title name=a_title / field column=article_url name=article_url / field column=ts name=ts / field column=rank name=rank / field column=blog_ref name=blog_ref / field column=blog_title name=b_title / field column=blog_subtitle name=subtitle / field column=blog_url name=blog_url / /entity /document /dataConfig Florian -- --Noble Paul
Re: error with delta import
The delta implementation is a bit fragile in DIH for complex queries I recommend you do delta-import using a full-import it can be done as follows define a diffferent entity dataConfig dataSource type=JdbcDataSource driver=org.postgresql.Driver url=jdbc:postgresql://bm02:5432/bm user=user / document name=articles entity name=articles-full .. /entity entity name=articles-delta rootEntity=false query=your-delta-query-here !-- this following entity can be a copy articles-full entity without any delta query because rootEntity=false for articles-delta the following will be used for creating documents. all other rules are same-- entity name=anyname .. /entity /entity /document when you wish to do a full-import pass the request parameter entity=articles-full for delta-import use the request parameter entity=articles-deltaclean=false (command has to be full-import only) On Wed, Oct 15, 2008 at 1:42 PM, Florian Aumeier [EMAIL PROTECTED] wrote: Shalin Shekhar Mangar schrieb: You are missing the pk field (primary key). This is used for delta imports. I added the pk field and rebuild the index yesterday. However, when I run the delta-import, I still have this error message in the log: INFO: Starting delta collection. Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: articles Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity articles with URL: jdbc:postgresql://bm02:5432/bm Oct 15, 2008 9:37:27 AM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 43 Oct 15, 2008 9:37:36 AM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0 Oct 15, 2008 9:44:51 AM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0 Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: articles rows obtained : 4584 Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running DeletedRowKey() for Entity: articles Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: articles rows obtained : 0 Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: articles Oct 15, 2008 9:50:43 AM org.apache.solr.handler.dataimport.DataImporter doDeltaImport SEVERE: Delta Import Failed java.lang.NullPointerException at org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153) at org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Oct 15, 2008 9:50:58 AM org.apache.solr.core.SolrCore execute INFO: [db] webapp=/solr path=/dataimport params={} status=0 QTime=0 Regards Florian -- --Noble Paul
Re: error with delta import
the last-index_time is available only from second time onwards that is . It expects a full-import to be done first It knows that by the presence of dataimport.properties in the config directory. Did you check if it is present? On Thu, Oct 16, 2008 at 5:33 PM, Florian Aumeier [EMAIL PROTECTED] wrote: Noble Paul നോബിള് नोब्ळ् schrieb: Well, when doing the way you described below (full-import with the delta query), the '${dataimporter.last_index_time}' timestamp is empty: I guess this was fixed post 1.3 . probably you can take dataimporthandler.jar from a nightly build (you may also need to add slf4j.jar) I replaced dist/apache-solr-dataimporthandler-1.3.0.jar dist/solrj-lib/slf4j-api-1.5.3.jar dist/solrj-lib/slf4j-jdk14-1.5.3.jar with their counterparts from the nightly build, but it did not help. Then I tried to enter the date kind of hard coded (now() - '12 hours'::interval). Everything looks fine, but there are no new documents in the index. here is the log: INFO: Starting Full Import Oct 16, 2008 1:07:08 PM org.apache.solr.core.SolrCore executeINFO: [test] webapp=/solr path=/dataimport params={command=full-importclean=falseentity=articles-delta} status=0 QTime=0 Oct 16, 2008 1:07:08 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity articles-delta with URL: jdbc:postgresql://bm02:5432/bm Oct 16, 2008 1:07:08 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 callINFO: Time taken for getConnection(): 45 Oct 16, 2008 1:14:53 PM org.apache.solr.core.SolrCore execute INFO: [test] webapp=/solr path=/dataimport params={} status=0 QTime=1 Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerPropertiesINFO: Read dataimport.properties Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.SolrWriter persistStartTime INFO: Wrote last indexed time to dataimport.properties Oct 16, 2008 1:16:11 PM org.apache.solr.handler.dataimport.DocBuilder commitINFO: Full Import completed successfullyOct 16, 2008 1:16:11 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)Oct 16, 2008 1:16:11 PM org.apache.solr.search.SolrIndexSearcher initINFO: Opening [EMAIL PROTECTED] mainOct 16, 2008 1:16:11 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush ... (autowarming) Oct 16, 2008 1:16:12 PM org.apache.solr.handler.dataimport.DocBuilder execute INFO: Time taken = 0:9:3.231 -- --Noble Paul
Re: dataimport, both splitBy and dateTimeFormat
Thanks David, I have updated the wiki documentation http://wiki.apache.org/solr/DataImportHandler#transformer The default transformers do not have any special privilege it is like any normal user provided transformer.We just identified some commonly found usecases and added transformers for that. Applying a transformer is not very 'cheap' it has to do extra checks to know whether to apply or not. On Fri, Oct 17, 2008 at 12:26 AM, David Smiley @MITRE.org [EMAIL PROTECTED] wrote: The wiki didn't mention I can specify multiple transformers. BTW, it's transformer (singular), not transformers. I did mean both NFT and DFT because I was speaking of the general case, not just mine in particular. I thought that the built-in transformers were always in-effect and so I expected NFT,DFT to occur last. Sorry if I wasn't clear. Thanks for your help; it worked. ~ David Shalin Shekhar Mangar wrote: Hi David, I think you meant RegexTransformer instead of NumberFormatTransformer. Anyhow, the order in which the transformers are applied is the same as the order in which you specify them. So make sure your entity has transformers=RegexTransformer,DateFormatTransformer. On Thu, Oct 16, 2008 at 6:14 PM, David Smiley @MITRE.org [EMAIL PROTECTED]wrote: I'm trying out the dataimport capability. I have a column that is a series of dates separated by spaces like so: 1996-00-00 1996-04-00 And I'm trying to import it like so: field column=r_event_date splitBy= dateTimeFormat=-MM-dd / However this fails and the stack trace suggests it is first trying to apply the dateTimeFormat before splitBy. I think this is a bug... dataimport should apply DateFormatTransformer and NumberFormatTransformer last. ~ David Smiley -- View this message in context: http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20013006.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/dataimport%2C-both-splitBy-and-dateTimeFormat-tp20013006p20016178.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: RegexTransformer debugging (DIH)
If it is a normal exception it is logged with the number of document where it failed and you can put it on debugger with start=x-1rows=1 We do not catch a throwable or Error so it gets slipped through. if you are adventurous enough wrap the RegexTranformer with your own and apply that say transformer=my.ReegexWrapper and catch a throwable and print out the row. On Thu, Oct 16, 2008 at 9:49 PM, Jon Baer [EMAIL PROTECTED] wrote: Is there a way to prevent this from occurring (or a way to nail down the doc which is causing it?): INFO: [news] webapp=/solr path=/admin/dataimport params={command=status} status=0 QTime=0 Exception in thread Thread-14 java.lang.StackOverflowError at java.util.regex.Pattern$Single.match(Pattern.java:3313) at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4763) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) at java.util.regex.Pattern$All.match(Pattern.java:4079) at java.util.regex.Pattern$Branch.match(Pattern.java:4538) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578) at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) at java.util.regex.Pattern$All.match(Pattern.java:4079) at java.util.regex.Pattern$Branch.match(Pattern.java:4538) at java.util.regex.Pattern$GroupHead.match(Pattern.java:4578) at java.util.regex.Pattern$LazyLoop.match(Pattern.java:4767) at java.util.regex.Pattern$GroupTail.match(Pattern.java:4637) at java.util.regex.Pattern$All.match(Pattern.java:4079) Thanks. - Jon -- --Noble Paul
Re: Different XML format for multi-valued fields?
The component that writes out the values do not know if it is multivalued or not. So if it finds only a single value it writes it out as such On Thu, Oct 16, 2008 at 10:52 PM, oleg_gnatovskiy [EMAIL PROTECTED] wrote: Hello. I have an index built in Solr with several multi-value fields. When the multi-value field has only one value for a document, the XML returned looks like this: arr name=someIds long name=someIds5693/long /arr However, when there are multiple values for the field, the XMl looks like this: arr name=someIds long11199/long long1722/long /arr Is there a reason for this difference? Also, how does faceting work with multi-valued fields? It seems that I sometimes get facet results from multi-valued fields, and sometimes I don't. Thanks. -- View this message in context: http://www.nabble.com/Different-XML-format-for-multi-valued-fields--tp20015951p20015951.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: Solr search not displaying all the indexed values.
how do you know that indexing is fine? does a query of *:* give all the results you wanted? On Thu, Oct 16, 2008 at 3:58 PM, con [EMAIL PROTECTED] wrote: Yes. something similar to : entity name=sample1 transformer=TemplateTransformer pk=userID query=select * from USER, CUSTOMER where USER.userID = CUSTOMER.userID field column=rowtype template=sample1 / field column=userID name=userID / /entity entity name=sample2 transformer=TemplateTransformer pk=userID query=select * from USER , MANAGER where USER.desig = MANAGER.desig field column=rowtype template=sample2 / field column=userID name=userID / /entity But the searching will not give all the results even if there is only one result. whereas indexing is fine. Thanks con Noble Paul നോബിള് नोब्ळ् wrote: do you have 2 queries in 2 different entities? On Thu, Oct 16, 2008 at 3:17 PM, con [EMAIL PROTECTED] wrote: I have two queries in my data-config.xml which takes values from multiple tables, like: select * from EMPLOYEE, CUSTOMER where EMPLOYEE.prod_id= CUSTOMER.prod_id. When i do a full-import it is indexing all the rows as expected. But when i search it with a *:* , it is not displaying all the values. Do I need any extra configurations? Thanks con -- View this message in context: http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010401.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010927.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: Solr search not displaying all the indexed values.
This is to debug your problem. remove the unniqueKey and run import. just see if all the docs are shown . If yes , then you have duplicate ids which cause some documents to be removed On Fri, Oct 17, 2008 at 2:01 PM, con [EMAIL PROTECTED] wrote: The responce that i get while executing http://localhost:8983/solr/core0/dataimport?command=full-import shows that all the rows that are expected to be the output of that query is getting indexed. The count, str name= Indexing completed. Added/Updated: 19 documents. Deleted 0 documents. /str is as expected. But when i invoke a *:* it is displaying only 9 records. Similarly, For another entity that indexes around 500 records, a *:* gives only 4 responces. Why this inconsistency ? how can I fix it before deploying it in actual production. Thanks con Noble Paul നോബിള് नोब्ळ् wrote: how do you know that indexing is fine? does a query of *:* give all the results you wanted? On Thu, Oct 16, 2008 at 3:58 PM, con [EMAIL PROTECTED] wrote: Yes. something similar to : entity name=sample1 transformer=TemplateTransformer pk=userID query=select * from USER, CUSTOMER where USER.userID = CUSTOMER.userID field column=rowtype template=sample1 / field column=userID name=userID / /entity entity name=sample2 transformer=TemplateTransformer pk=userID query=select * from USER , MANAGER where USER.desig = MANAGER.desig field column=rowtype template=sample2 / field column=userID name=userID / /entity But the searching will not give all the results even if there is only one result. whereas indexing is fine. Thanks con Noble Paul നോബിള് नोब्ळ् wrote: do you have 2 queries in 2 different entities? On Thu, Oct 16, 2008 at 3:17 PM, con [EMAIL PROTECTED] wrote: I have two queries in my data-config.xml which takes values from multiple tables, like: select * from EMPLOYEE, CUSTOMER where EMPLOYEE.prod_id= CUSTOMER.prod_id. When i do a full-import it is indexing all the rows as expected. But when i search it with a *:* , it is not displaying all the values. Do I need any extra configurations? Thanks con -- View this message in context: http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010401.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20010927.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/Solr-search-not-displaying-all-the-indexed-values.-tp20010401p20029228.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: MySql - Solr 1.3 - Full import, how to make request pack smaller?
do you have a nested entities? Then there is chance of firing too many requests to MySql. If you have nested entities try using CachedSqlEntityProcessor for the inner ones (only for the inner ones). i am assuming you have enough RAM to support this --Noble On Mon, Oct 20, 2008 at 3:13 PM, sunnyfr [EMAIL PROTECTED] wrote: Hi I don't have a problem of memory but It's a production database and I stuck other service on it because I've too much request on it, how can I make it maybe longer but taking less resources of MySql? Thanks a lot, -- View this message in context: http://www.nabble.com/MySql---Solr-1.3---Full-import%2C-how-to-make-request-pack-smaller--tp20066186p20066186.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: error with delta import
On Tue, Oct 21, 2008 at 12:56 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Your data-config looks fine except for one thing -- you do not need to escape '' character in an XML attribute. It maybe throwing off the parsing code in DataImportHandler. not really '' is fine in attribute Another question, does the full-import work fine? On Mon, Oct 20, 2008 at 7:31 PM, Florian Aumeier [EMAIL PROTECTED]wrote: sorry to bother you again, but the delta import still does not work for me :-( We tried: * delta-import by full-import entity name=articles-delta rootEntity=false query=your-delta-query-here with entity=articles-deltaclean=false * delta-import by full-import with simplified query * delta-import with simplified query entity name=articles-delta pk=article_ref deltaQuery=SELECT * FROM full_text_view WHERE article_id lt; 300 * replaced files below with files from nightly-build 15.10.08 and rerun the delta and full imports as described above dist/apache-solr-dataimporthandler-1.3.0.jar dist/solrj-lib/slf4j-api-1.5.3.jar dist/solrj-lib/slf4j-jdk14-1.5.3.jar No matter what we do, we always end up in a situation, when the dataimport status looks fine: lst name=statusMessages str name=Time Elapsed0:0:8.442/str str name=Total Requests made to DataSource1/str str name=Total Rows Fetched218/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-10-20 15:31:54/str str name=Identifying Delta2008-10-20 15:31:54/str str name=Deltas Obtained2008-10-20 15:31:57/str str name=Building documents2008-10-20 15:31:57/str str name=Total Changed Documents218/str but the log reads: Oct 20, 2008 3:56:44 PM org.apache.solr.core.SolrCore execute INFO: [test] webapp=/solr path=/dataimport params={command=delta-import} status=0 QTime=0 Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: articles-full Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity articles-full with URL: jdbc:postgresql://blogmonitor02:5432/blogmonitor Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 5 Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: articles-full rows obtained : 218 Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running DeletedRowKey() for Entity: articles-full Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: articles-full rows obtained : 0 Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: articles-full Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport SEVERE: Delta Import Failed java.lang.NullPointerException at org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153) at org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) here is the full data-config: dataConfig dataSource type=JdbcDataSource driver=org.postgresql.Driver url=jdbc:postgresql://bm02:5432/bm user=bm / document name=articles entity name=articles-full pk=id query=SELECT * FROM full_text_view where article_id lt; 200 deltaQuery=SELECT * FROM full_text_view WHERE article_id lt; 300 field column=article_id name=a_id / field column=normalized_text name=norm_text / field column=article_ref name=id / field column=article_stub name=stub / field column=id_blogs name=blog_id / field column=article_title name=a_title / field column=article_url name=article_url / field column=ts name=ts / field column=rank name=rank / field
Re: error with delta import
you are still doing a delta import . with the modified data-config you must do a command=full-import On Mon, Oct 20, 2008 at 7:31 PM, Florian Aumeier [EMAIL PROTECTED] wrote: sorry to bother you again, but the delta import still does not work for me :-( We tried: * delta-import by full-import entity name=articles-delta rootEntity=false query=your-delta-query-here with entity=articles-deltaclean=false * delta-import by full-import with simplified query * delta-import with simplified query entity name=articles-delta pk=article_ref deltaQuery=SELECT * FROM full_text_view WHERE article_id lt; 300 * replaced files below with files from nightly-build 15.10.08 and rerun the delta and full imports as described above dist/apache-solr-dataimporthandler-1.3.0.jar dist/solrj-lib/slf4j-api-1.5.3.jar dist/solrj-lib/slf4j-jdk14-1.5.3.jar No matter what we do, we always end up in a situation, when the dataimport status looks fine: lst name=statusMessages str name=Time Elapsed0:0:8.442/str str name=Total Requests made to DataSource1/str str name=Total Rows Fetched218/str str name=Total Documents Skipped0/str str name=Delta Dump started2008-10-20 15:31:54/str str name=Identifying Delta2008-10-20 15:31:54/str str name=Deltas Obtained2008-10-20 15:31:57/str str name=Building documents2008-10-20 15:31:57/str str name=Total Changed Documents218/str but the log reads: Oct 20, 2008 3:56:44 PM org.apache.solr.core.SolrCore execute INFO: [test] webapp=/solr path=/dataimport params={command=delta-import} status=0 QTime=0 Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport INFO: Starting Delta Import Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder doDelta INFO: Starting delta collection. Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running ModifiedRowKey() for Entity: articles-full Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity articles-full with URL: jdbc:postgresql://blogmonitor02:5432/blogmonitor Oct 20, 2008 3:56:44 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 5 Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: articles-full rows obtained : 218 Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Running DeletedRowKey() for Entity: articles-full Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: articles-full rows obtained : 0 Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: articles-full Oct 20, 2008 3:56:46 PM org.apache.solr.handler.dataimport.DataImporter doDeltaImport SEVERE: Delta Import Failed java.lang.NullPointerException at org.apache.solr.handler.dataimport.SqlEntityProcessor.getDeltaImportQuery(SqlEntityProcessor.java:153) at org.apache.solr.handler.dataimport.SqlEntityProcessor.getQuery(SqlEntityProcessor.java:125) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:211) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:133) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:388) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) here is the full data-config: dataConfig dataSource type=JdbcDataSource driver=org.postgresql.Driver url=jdbc:postgresql://bm02:5432/bm user=bm / document name=articles entity name=articles-full pk=id query=SELECT * FROM full_text_view where article_id lt; 200 deltaQuery=SELECT * FROM full_text_view WHERE article_id lt; 300 field column=article_id name=a_id / field column=normalized_text name=norm_text / field column=article_ref name=id / field column=article_stub name=stub / field column=id_blogs name=blog_id / field column=article_title name=a_title / field column=article_url name=article_url / field column=ts name=ts / field column=rank name=rank / field column=blog_ref name=blog_ref / field column=blog_title name=b_title / field column=blog_subtitle name=subtitle / field column=blog_url name=blog_url / /entity /document /dataConfig what are we doing wrong? Florian
Re: MySql - Solr 1.3 - Full import, how to make request pack smaller?
nested entity means one entity inside othere as follows entity name=e1 ... entity name=e2 ... /entity /entity In this case for each row in e1 one query is executed on e2 On Mon, Oct 20, 2008 at 6:01 PM, sunnyfr [EMAIL PROTECTED] wrote: sorry but bested entities is not an expression that i know, I'm french, what does that means ... is it when in one request you have several table and inner join between them ? Noble Paul നോബിള് नोब्ळ् wrote: do you have a nested entities? Then there is chance of firing too many requests to MySql. If you have nested entities try using CachedSqlEntityProcessor for the inner ones (only for the inner ones). i am assuming you have enough RAM to support this --Noble On Mon, Oct 20, 2008 at 3:13 PM, sunnyfr [EMAIL PROTECTED] wrote: Hi I don't have a problem of memory but It's a production database and I stuck other service on it because I've too much request on it, how can I make it maybe longer but taking less resources of MySql? Thanks a lot, -- View this message in context: http://www.nabble.com/MySql---Solr-1.3---Full-import%2C-how-to-make-request-pack-smaller--tp20066186p20066186.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/MySql---Solr-1.3---Full-import%2C-how-to-make-request-pack-smaller--tp20066186p20067466.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: function to clear up string to utf8 before indexing, where should I put it?
you can try out a Transformer to translate that On Wed, Oct 22, 2008 at 2:00 PM, sunnyfr [EMAIL PROTECTED] wrote: I've a function to clear up string which are in latin1 to UTF8, I would like to know where exactly should I put it in the java code to clear up string before indexing ? Thanks a lot for this information, Sunny I'm using solr1.3, mysql, tomcat55 -- View this message in context: http://www.nabble.com/function-to-clear-up-string-to-utf8-before-indexing%2C-where-should-I-put-it--tp20106224p20106224.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: function to clear up string to utf8 before indexing, where should I put it?
http://wiki.apache.org/solr/DataImportHandler#head-eb523b0943596587f05532f3ebc506ea6d9a606b On Wed, Oct 22, 2008 at 4:41 PM, sunnyfr [EMAIL PROTECTED] wrote: Can you tell me more about it ? Noble Paul നോബിള് नोब्ळ् wrote: you can try out a Transformer to translate that On Wed, Oct 22, 2008 at 2:00 PM, sunnyfr [EMAIL PROTECTED] wrote: I've a function to clear up string which are in latin1 to UTF8, I would like to know where exactly should I put it in the java code to clear up string before indexing ? Thanks a lot for this information, Sunny I'm using solr1.3, mysql, tomcat55 -- View this message in context: http://www.nabble.com/function-to-clear-up-string-to-utf8-before-indexing%2C-where-should-I-put-it--tp20106224p20106224.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- View this message in context: http://www.nabble.com/function-to-clear-up-string-to-utf8-before-indexing%2C-where-should-I-put-it--tp20106224p20108569.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: error with delta import
The case in point is DIH. DIH uses the standard DOM parser that comes w/ JDK. If it reads the xml properly do we need to complain?. I guess that data-config.xml may not be used for any other purposes. On Wed, Oct 22, 2008 at 10:10 PM, Walter Underwood [EMAIL PROTECTED] wrote: On 10/22/08 8:57 AM, Steven A Rowe [EMAIL PROTECTED] wrote: Telling people that it's not a problem (or required!) to write non-well-formed XML, because a particular XML parser can't accept well-formed XML is kind of insidious. I'm with you all the way on this. A parser which accepts non-well-formed XML is not an XML parser, since the XML spec requires reporting a fatal error. It is really easy to test these things. Modern browsers have good XML parsers, so put your test case in a test.xml file and open it in a browser. If it isn't well-formed, you'll get an error. Here is my test XML: root attribute=/ Here is what Firefox 3.0.3 says about that: XML Parsing Error: not well-formed Location: file:///Users/wunderwood/Desktop/test.xml Line Number 1, Column 18: root attribute=/ -^ wunder -- --Noble Paul