Re: term vectors
Nice and timely topic for me. You may find this this interesting: http://www.jroller.com/otis/entry/xml_dbs_vs_search_engines Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Walter Underwood > To: solr-user@lucene.apache.org > Sent: Wednesday, May 27, 2009 10:53:16 PM > Subject: Re: term vectors > > If you really, really need to do XML-smart queries, go ahead and buy > MarkLogic. I've worked with the principle folk there and they are > really sharp. Their engine is awesome. XML search is hard, and you > can't take a regular search engine, even a really good one, and make > it do full XML without tons of work. > > If, as Erik and Matt suggest, you can discover a substantially simpler > (and flat) search schema that makes your users happy, then go ahead and > use Solr. > > wunder > > On 5/27/09 7:00 PM, "Matt Mitchell" wrote: > > > I've been experimenting with the XML + Solr combo too. What I've found to be > > a good working solution is to: > > > > pick out the nodes you want as solr documents (every div1 or div2 etc.) > > index the text only (with lots of metadata fields) > > add a field for either the xpath to that node, or > > save the individual nodes (at index time) into seperate files and store > > the name of the file in the solr doc > > You could even store the chunked XML in a non-tokenized, stored field in > > the solr document as long as the XML isn't too huge. > > > > So when you do your search, you get all of the power of solr. Then use the > > xpath field or the filename field to load the chunk, then transform. > > > > Matt > > > > On Wed, May 27, 2009 at 8:25 PM, Erik Hatcher > > wrote: > > > >> > >> On May 27, 2009, at 4:56 PM, Yosvanys Aponte wrote: > >> > >>> i undestand what you say > >>> but the problem i have is > >>> > >>> user can make query like this: > >>> > >>> //tei.2//p"[quijote"] > >>> > >> > >> A couple of problems with this... for one, there's no query parser that'll > >> interpret that syntax as you mean it in Solr. And also, indexing the > >> hierarchical structure (of TEI, which I'm painfully familiar with) requires > >> flattening or doing lots of overlapped indexing of fields that represent > >> the > >> hierarchy at various levels. > >> > >> In my experience with the TEI domain, users don't *really* want to query > >> like that even though they'll say they do because it's the only way they're > >> used to doing it. > >> > >> Perhaps step back and ask yourself and your users what is really desired > >> from the search application you're building. What's the goal? What needs > >> to displayed? What type of query entry form will they be typing into? > >> > >>Erik > >> > >>
Re: term vectors
If you really, really need to do XML-smart queries, go ahead and buy MarkLogic. I've worked with the principle folk there and they are really sharp. Their engine is awesome. XML search is hard, and you can't take a regular search engine, even a really good one, and make it do full XML without tons of work. If, as Erik and Matt suggest, you can discover a substantially simpler (and flat) search schema that makes your users happy, then go ahead and use Solr. wunder On 5/27/09 7:00 PM, "Matt Mitchell" wrote: > I've been experimenting with the XML + Solr combo too. What I've found to be > a good working solution is to: > > pick out the nodes you want as solr documents (every div1 or div2 etc.) > index the text only (with lots of metadata fields) > add a field for either the xpath to that node, or > save the individual nodes (at index time) into seperate files and store > the name of the file in the solr doc > You could even store the chunked XML in a non-tokenized, stored field in > the solr document as long as the XML isn't too huge. > > So when you do your search, you get all of the power of solr. Then use the > xpath field or the filename field to load the chunk, then transform. > > Matt > > On Wed, May 27, 2009 at 8:25 PM, Erik Hatcher > wrote: > >> >> On May 27, 2009, at 4:56 PM, Yosvanys Aponte wrote: >> >>> i undestand what you say >>> but the problem i have is >>> >>> user can make query like this: >>> >>> //tei.2//p"[quijote"] >>> >> >> A couple of problems with this... for one, there's no query parser that'll >> interpret that syntax as you mean it in Solr. And also, indexing the >> hierarchical structure (of TEI, which I'm painfully familiar with) requires >> flattening or doing lots of overlapped indexing of fields that represent the >> hierarchy at various levels. >> >> In my experience with the TEI domain, users don't *really* want to query >> like that even though they'll say they do because it's the only way they're >> used to doing it. >> >> Perhaps step back and ask yourself and your users what is really desired >> from the search application you're building. What's the goal? What needs >> to displayed? What type of query entry form will they be typing into? >> >>Erik >> >>
Re: term vectors
I've been experimenting with the XML + Solr combo too. What I've found to be a good working solution is to: pick out the nodes you want as solr documents (every div1 or div2 etc.) index the text only (with lots of metadata fields) add a field for either the xpath to that node, or save the individual nodes (at index time) into seperate files and store the name of the file in the solr doc You could even store the chunked XML in a non-tokenized, stored field in the solr document as long as the XML isn't too huge. So when you do your search, you get all of the power of solr. Then use the xpath field or the filename field to load the chunk, then transform. Matt On Wed, May 27, 2009 at 8:25 PM, Erik Hatcher wrote: > > On May 27, 2009, at 4:56 PM, Yosvanys Aponte wrote: > >> i undestand what you say >> but the problem i have is >> >> user can make query like this: >> >> //tei.2//p"[quijote"] >> > > A couple of problems with this... for one, there's no query parser that'll > interpret that syntax as you mean it in Solr. And also, indexing the > hierarchical structure (of TEI, which I'm painfully familiar with) requires > flattening or doing lots of overlapped indexing of fields that represent the > hierarchy at various levels. > > In my experience with the TEI domain, users don't *really* want to query > like that even though they'll say they do because it's the only way they're > used to doing it. > > Perhaps step back and ask yourself and your users what is really desired > from the search application you're building. What's the goal? What needs > to displayed? What type of query entry form will they be typing into? > >Erik > >
Re: term vectors
On May 27, 2009, at 4:56 PM, Yosvanys Aponte wrote: i undestand what you say but the problem i have is user can make query like this: //tei.2//p"[quijote"] A couple of problems with this... for one, there's no query parser that'll interpret that syntax as you mean it in Solr. And also, indexing the hierarchical structure (of TEI, which I'm painfully familiar with) requires flattening or doing lots of overlapped indexing of fields that represent the hierarchy at various levels. In my experience with the TEI domain, users don't *really* want to query like that even though they'll say they do because it's the only way they're used to doing it. Perhaps step back and ask yourself and your users what is really desired from the search application you're building. What's the goal? What needs to displayed? What type of query entry form will they be typing into? Erik
Re: term vectors
i undestand what you say but the problem i have is user can make query like this: //tei.2//p"[quijote"] user want to find all paragraph that belong to tei.2 and have the word "quijote" then i have to search structure and content, because i have the and index format to save the structure and the content i index with solr. first i have to search all path that correspond to the query, and then is when i need to search in solr in the pointers that i have save in all xpath list but i dont know how to search in specific fields (pointers), i think that is more rapidly, using that form than searching in all solr fields thanks Erik Hatcher wrote: > > Aponte - I'm not quite understanding your question. Could you provide > some detailed examples of what you're trying to accomplish? > > Just guessing from your post, it seems what you are after is > flattening your structure so that it fits within Solr/Lucene's > document & field capabilities and then doing fielded search from > there. I'm not sure term vectors relates to what you're doing, but > we'll know more if you post some more details. > > Thanks, > Erik > > On May 26, 2009, at 5:35 AM, Yosvanys Aponte wrote: > >> >> Hello!! >> >> I´m working with solr, indexing de content of xml files from a digital >> library. I have the structure of this xml files save in a tree >> with an >> optimal form , and in the leave i need to have the reference of the >> field >> save in solr. I do this because when a i got the leaves i have do >> some part >> of the query, that is the structure part , but the content is solr, >> i a need >> to search only in some fields not in the hole solr database. >> Could term vectors help me to do this, or there is other way to do it. >> >> thanks >> Aponte >> -- >> View this message in context: >> http://www.nabble.com/term-vectors-tp23719768p23719768.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/term-vectors-tp23719768p23750663.html Sent from the Solr - User mailing list archive at Nabble.com.
index time boosting on multivalued fields
I can set the boost of a field or doc at index time using the boost attr in the update message, e.g. pet But that won't work for multivalued fields according to the RelevancyFAQ pet animal ( I assume it applies the last boost parsed to all terms? ) Now, say I'd like to do index-time boosting of a multivalued field with each value having a unique boost. I could simply index the field multiple times: pet pet animal But is there a more exact way?
Re: 1.4 Replication
Bug filed. Thankyou. On Wed, 2009-05-27 at 22:40 +0530, Shalin Shekhar Mangar wrote: > On Wed, May 27, 2009 at 9:01 PM, Matthew Gregg wrote: > > > That is disappointing then. Restricting by IP may be doable, but much > > more work than basic auth. > > > > > The beauty of open source is that this can be changed :) > > Please open an issue, we can have basic http authentication made > configurable. > -- Matthew Gregg
Re: 1.4 Replication
On Wed, May 27, 2009 at 9:01 PM, Matthew Gregg wrote: > That is disappointing then. Restricting by IP may be doable, but much > more work than basic auth. > > The beauty of open source is that this can be changed :) Please open an issue, we can have basic http authentication made configurable. -- Regards, Shalin Shekhar Mangar.
Re: 1.4 Replication
That is disappointing then. Restricting by IP may be doable, but much more work than basic auth. On Wed, 2009-05-27 at 20:41 +0530, Noble Paul നോബിള് नोब्ळ् wrote: > replication has no builtin security > > > > On Wed, May 27, 2009 at 8:37 PM, Matthew Gregg > wrote: > > I would like the to protect both reads and writes. Reads could have a > > significant impact. I guess the answer is no, replication has no built > > in security? > > > > On Wed, 2009-05-27 at 20:11 +0530, Noble Paul നോബിള് नोब्ळ् wrote: > >> The question is what all do you wish to protect. > >> There are 'read' as well as 'write' attributes . > >> > >> The reads are the ones which will not cause any harm other than > >> consuming some cpu cycles. > >> > >> The writes are the ones which can change the state of the system. > >> > >> The slave uses the 'read' API's which i feel may not need to be protected > >> > >> The other API's methods can have security . say dnappull, diableSnapPoll > >> etc > >> > >> > >> > >> On Wed, May 27, 2009 at 7:47 PM, Matthew Gregg > >> wrote: > >> > On Wed, 2009-05-27 at 19:06 +0530, Noble Paul നോബിള് नोब्ळ् wrote: > >> >> On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg > >> >> wrote: > >> >> > Does replication in 1.4 support passing credentials/basic auth? If > >> >> > not > >> >> > what is the best option to protect replication? > >> >> do you mean protecting the url /replication ? > >> > Yes I would like to put /replication behind basic auth, which I can do, > >> > but replication fails. I naively tried the obvious > >> > http://user:p...@host/replication, but that fails. > >> > > >> >> > >> >> ideally Solr is expected to run in an unprotected environment. if you > >> >> wish to introduce some security it has to be built by you. > >> >> > > >> >> > > >> > I guess you meant Solr is expected to run in a "protected" environment? > >> > It's pretty easy to put up a basic auth in front of Solr, but the > >> > replication infra. in 1.4 doesn't seem to support it. Or does it, and I > >> > just don't know how? > >> > > >> > -- > >> > Matthew Gregg > >> > > >> > > >> > >> > >> > > -- > > Matthew Gregg > > > > > > > -- Matthew Gregg
Re: 1.4 Replication
replication has no builtin security On Wed, May 27, 2009 at 8:37 PM, Matthew Gregg wrote: > I would like the to protect both reads and writes. Reads could have a > significant impact. I guess the answer is no, replication has no built > in security? > > On Wed, 2009-05-27 at 20:11 +0530, Noble Paul നോബിള് नोब्ळ् wrote: >> The question is what all do you wish to protect. >> There are 'read' as well as 'write' attributes . >> >> The reads are the ones which will not cause any harm other than >> consuming some cpu cycles. >> >> The writes are the ones which can change the state of the system. >> >> The slave uses the 'read' API's which i feel may not need to be protected >> >> The other API's methods can have security . say dnappull, diableSnapPoll etc >> >> >> >> On Wed, May 27, 2009 at 7:47 PM, Matthew Gregg >> wrote: >> > On Wed, 2009-05-27 at 19:06 +0530, Noble Paul നോബിള് नोब्ळ् wrote: >> >> On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg >> >> wrote: >> >> > Does replication in 1.4 support passing credentials/basic auth? If not >> >> > what is the best option to protect replication? >> >> do you mean protecting the url /replication ? >> > Yes I would like to put /replication behind basic auth, which I can do, >> > but replication fails. I naively tried the obvious >> > http://user:p...@host/replication, but that fails. >> > >> >> >> >> ideally Solr is expected to run in an unprotected environment. if you >> >> wish to introduce some security it has to be built by you. >> >> > >> >> > >> > I guess you meant Solr is expected to run in a "protected" environment? >> > It's pretty easy to put up a basic auth in front of Solr, but the >> > replication infra. in 1.4 doesn't seem to support it. Or does it, and I >> > just don't know how? >> > >> > -- >> > Matthew Gregg >> > >> > >> >> >> > -- > Matthew Gregg > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: 1.4 Replication
I would like the to protect both reads and writes. Reads could have a significant impact. I guess the answer is no, replication has no built in security? On Wed, 2009-05-27 at 20:11 +0530, Noble Paul നോബിള് नोब्ळ् wrote: > The question is what all do you wish to protect. > There are 'read' as well as 'write' attributes . > > The reads are the ones which will not cause any harm other than > consuming some cpu cycles. > > The writes are the ones which can change the state of the system. > > The slave uses the 'read' API's which i feel may not need to be protected > > The other API's methods can have security . say dnappull, diableSnapPoll etc > > > > On Wed, May 27, 2009 at 7:47 PM, Matthew Gregg > wrote: > > On Wed, 2009-05-27 at 19:06 +0530, Noble Paul നോബിള് नोब्ळ् wrote: > >> On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg > >> wrote: > >> > Does replication in 1.4 support passing credentials/basic auth? If not > >> > what is the best option to protect replication? > >> do you mean protecting the url /replication ? > > Yes I would like to put /replication behind basic auth, which I can do, > > but replication fails. I naively tried the obvious > > http://user:p...@host/replication, but that fails. > > > >> > >> ideally Solr is expected to run in an unprotected environment. if you > >> wish to introduce some security it has to be built by you. > >> > > >> > > > I guess you meant Solr is expected to run in a "protected" environment? > > It's pretty easy to put up a basic auth in front of Solr, but the > > replication infra. in 1.4 doesn't seem to support it. Or does it, and I > > just don't know how? > > > > -- > > Matthew Gregg > > > > > > > -- Matthew Gregg
Re: Indexing from DB connection issue
all I can suggest is write a simple jdbc program and see if it works from that m/c( any privilege issue etc?) On Wed, May 27, 2009 at 7:15 PM, ahammad wrote: > > Hello, > > I tried your suggestion, and it still gives me the same error. > > I'd like to point out again that the same folder/config setup is running on > my machine with no issues, but it gives me that stack trace in the logs on > the server. > > When I do the full data import request through the browser, I get this: > > > − > > 0 > 0 > > − > > − > > data-config.xml > > > full-import > idle > > − > > 0:0:1.329 > 1 > 0 > 0 > 0 > 2009-05-27 09:42:24 > > − > > This response format is experimental. It is likely to change in the future. > > > > > Refreshing the page usually results in requests to datasource/rows fetched > etc numbers to increase. In my case the request to datasource stays at 1 > regardless. Looks like it tries once and fails, then it terminates the > process... > > Regards > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> no need to rename . >> >> On Wed, May 27, 2009 at 6:50 PM, ahammad wrote: >>> >>> Would I need to rename it or refer to it somewhere? Or can I keep the >>> existing name (apache-solr-dataimporthandler-1.4-dev.jar)? >>> >>> Cheers >>> >>> >>> Noble Paul നോബിള് नोब्ळ्-2 wrote: take the trunk dih.jar. use winzip/winrar or any tool and just delete all the files other than ClobTransformer.class. put that jar into solr.home/lib On Wed, May 27, 2009 at 6:10 PM, ahammad wrote: > > Hmmm, that's probably a good idea...although it does not explain how my > current local setup works. > > Can you please explain how this is done? I am assuming that I need to > add > the class itself to the source of solr 1.3, and then compile the code, > and > take the new .war file and put it in Tomcat? If that is correct, where > in > the source folders would the ClobTransformer.class file go? > > Thanks. > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> I guess it is better to copy the ClobTransformer.class alone and use >> the old Solr1.3 DIH >> >> >> >> >> >> On Tue, May 26, 2009 at 11:50 PM, ahammad >> wrote: >>> >>> I have an update: >>> >>> I played around with it some more and it seems like it's being caused >>> by >>> the >>> ClobTransformer. If I remove the 'clob="true"' from the field part in >>> the >>> data-config, it works fine. >>> >>> The Solr install is a multicore one. I placed the >>> apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in >>> the >>> {solrHome}/core1/lib directory (I only need it for the first core). >>> Is >>> there >>> something else I need to do for it to work? >>> >>> I don't recall doing an additional step when I did this a few weeks >>> ago >>> on >>> my local machine. >>> >>> Any help is appreciated. >>> >>> Regards >>> >>> >>> ahammad wrote: Hello all, I am tyring to index directly from an Oracle DB. This is what appears in the stack trace: SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from ARTICLE Processing Document # 1 at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Caused by: java.sql.SQLException: Closed Connection at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) at oracle.jdbc.driver.DatabaseError
Re: 1.4 Replication
The question is what all do you wish to protect. There are 'read' as well as 'write' attributes . The reads are the ones which will not cause any harm other than consuming some cpu cycles. The writes are the ones which can change the state of the system. The slave uses the 'read' API's which i feel may not need to be protected The other API's methods can have security . say dnappull, diableSnapPoll etc On Wed, May 27, 2009 at 7:47 PM, Matthew Gregg wrote: > On Wed, 2009-05-27 at 19:06 +0530, Noble Paul നോബിള് नोब्ळ् wrote: >> On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg >> wrote: >> > Does replication in 1.4 support passing credentials/basic auth? If not >> > what is the best option to protect replication? >> do you mean protecting the url /replication ? > Yes I would like to put /replication behind basic auth, which I can do, > but replication fails. I naively tried the obvious > http://user:p...@host/replication, but that fails. > >> >> ideally Solr is expected to run in an unprotected environment. if you >> wish to introduce some security it has to be built by you. >> > >> > > I guess you meant Solr is expected to run in a "protected" environment? > It's pretty easy to put up a basic auth in front of Solr, but the > replication infra. in 1.4 doesn't seem to support it. Or does it, and I > just don't know how? > > -- > Matthew Gregg > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: 1.4 Replication
I've not figured out a way to use basic auth with replication. We ended up using IP based auth, it shouldn't be too tricky to add basicauth support as, IIRC, the replication is based on the commons httpclient library. On 27 May 2009, at 15:17, Matthew Gregg wrote: On Wed, 2009-05-27 at 19:06 +0530, Noble Paul നോബിള് नोब्ळ् wrote: On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg > wrote: Does replication in 1.4 support passing credentials/basic auth? If not what is the best option to protect replication? do you mean protecting the url /replication ? Yes I would like to put /replication behind basic auth, which I can do, but replication fails. I naively tried the obvious http://user:p...@host/replication, but that fails. ideally Solr is expected to run in an unprotected environment. if you wish to introduce some security it has to be built by you. I guess you meant Solr is expected to run in a "protected" environment? It's pretty easy to put up a basic auth in front of Solr, but the replication infra. in 1.4 doesn't seem to support it. Or does it, and I just don't know how? -- Matthew Gregg Toby Cole Software Engineer Semantico Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE W: www.semantico.com
Re: 1.4 Replication
On Wed, 2009-05-27 at 19:06 +0530, Noble Paul നോബിള് नोब्ळ् wrote: > On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg > wrote: > > Does replication in 1.4 support passing credentials/basic auth? If not > > what is the best option to protect replication? > do you mean protecting the url /replication ? Yes I would like to put /replication behind basic auth, which I can do, but replication fails. I naively tried the obvious http://user:p...@host/replication, but that fails. > > ideally Solr is expected to run in an unprotected environment. if you > wish to introduce some security it has to be built by you. > > > > I guess you meant Solr is expected to run in a "protected" environment? It's pretty easy to put up a basic auth in front of Solr, but the replication infra. in 1.4 doesn't seem to support it. Or does it, and I just don't know how? -- Matthew Gregg
Re: Indexing from DB connection issue
Hello, I tried your suggestion, and it still gives me the same error. I'd like to point out again that the same folder/config setup is running on my machine with no issues, but it gives me that stack trace in the logs on the server. When I do the full data import request through the browser, I get this: − 0 0 − − data-config.xml full-import idle − 0:0:1.329 1 0 0 0 2009-05-27 09:42:24 − This response format is experimental. It is likely to change in the future. Refreshing the page usually results in requests to datasource/rows fetched etc numbers to increase. In my case the request to datasource stays at 1 regardless. Looks like it tries once and fails, then it terminates the process... Regards Noble Paul നോബിള് नोब्ळ्-2 wrote: > > no need to rename . > > On Wed, May 27, 2009 at 6:50 PM, ahammad wrote: >> >> Would I need to rename it or refer to it somewhere? Or can I keep the >> existing name (apache-solr-dataimporthandler-1.4-dev.jar)? >> >> Cheers >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> take the trunk dih.jar. use winzip/winrar or any tool and just delete >>> all the files other than ClobTransformer.class. put that jar into >>> solr.home/lib >>> >>> On Wed, May 27, 2009 at 6:10 PM, ahammad wrote: Hmmm, that's probably a good idea...although it does not explain how my current local setup works. Can you please explain how this is done? I am assuming that I need to add the class itself to the source of solr 1.3, and then compile the code, and take the new .war file and put it in Tomcat? If that is correct, where in the source folders would the ClobTransformer.class file go? Thanks. Noble Paul നോബിള് नोब्ळ्-2 wrote: > > I guess it is better to copy the ClobTransformer.class alone and use > the old Solr1.3 DIH > > > > > > On Tue, May 26, 2009 at 11:50 PM, ahammad > wrote: >> >> I have an update: >> >> I played around with it some more and it seems like it's being caused >> by >> the >> ClobTransformer. If I remove the 'clob="true"' from the field part in >> the >> data-config, it works fine. >> >> The Solr install is a multicore one. I placed the >> apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in >> the >> {solrHome}/core1/lib directory (I only need it for the first core). >> Is >> there >> something else I need to do for it to work? >> >> I don't recall doing an additional step when I did this a few weeks >> ago >> on >> my local machine. >> >> Any help is appreciated. >> >> Regards >> >> >> ahammad wrote: >>> >>> Hello all, >>> >>> I am tyring to index directly from an Oracle DB. This is what >>> appears >>> in >>> the stack trace: >>> >>> SEVERE: Full Import failed >>> org.apache.solr.handler.dataimport.DataImportHandlerException: >>> Unable >>> to >>> execute query: select * from ARTICLE Processing Document # 1 >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) >>> at >>> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >>> at >>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>> at >>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>> at >>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>> at >>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>> Caused by: java.sql.SQLException: Closed Connection >>> at >>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) >>> at >>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) >>> at >>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) >>> at >>> oracle.jdbc.driver.PhysicalConnection.createStatement(PhysicalConnection.java:755) >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:174) >>> ... 10 more >>> >>> Funny thing is, the data import works on my local machin
Re: 1.4 Replication
On Wed, May 27, 2009 at 6:48 PM, Matthew Gregg wrote: > Does replication in 1.4 support passing credentials/basic auth? If not > what is the best option to protect replication? do you mean protecting the url /replication ? ideally Solr is expected to run in an unprotected environment. if you wish to introduce some security it has to be built by you. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Indexing from DB connection issue
no need to rename . On Wed, May 27, 2009 at 6:50 PM, ahammad wrote: > > Would I need to rename it or refer to it somewhere? Or can I keep the > existing name (apache-solr-dataimporthandler-1.4-dev.jar)? > > Cheers > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> take the trunk dih.jar. use winzip/winrar or any tool and just delete >> all the files other than ClobTransformer.class. put that jar into >> solr.home/lib >> >> On Wed, May 27, 2009 at 6:10 PM, ahammad wrote: >>> >>> Hmmm, that's probably a good idea...although it does not explain how my >>> current local setup works. >>> >>> Can you please explain how this is done? I am assuming that I need to add >>> the class itself to the source of solr 1.3, and then compile the code, >>> and >>> take the new .war file and put it in Tomcat? If that is correct, where in >>> the source folders would the ClobTransformer.class file go? >>> >>> Thanks. >>> >>> >>> >>> Noble Paul നോബിള് नोब्ळ्-2 wrote: I guess it is better to copy the ClobTransformer.class alone and use the old Solr1.3 DIH On Tue, May 26, 2009 at 11:50 PM, ahammad wrote: > > I have an update: > > I played around with it some more and it seems like it's being caused > by > the > ClobTransformer. If I remove the 'clob="true"' from the field part in > the > data-config, it works fine. > > The Solr install is a multicore one. I placed the > apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in > the > {solrHome}/core1/lib directory (I only need it for the first core). Is > there > something else I need to do for it to work? > > I don't recall doing an additional step when I did this a few weeks ago > on > my local machine. > > Any help is appreciated. > > Regards > > > ahammad wrote: >> >> Hello all, >> >> I am tyring to index directly from an Oracle DB. This is what appears >> in >> the stack trace: >> >> SEVERE: Full Import failed >> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable >> to >> execute query: select * from ARTICLE Processing Document # 1 >> at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) >> at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >> at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >> at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >> Caused by: java.sql.SQLException: Closed Connection >> at >> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) >> at >> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) >> at >> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) >> at >> oracle.jdbc.driver.PhysicalConnection.createStatement(PhysicalConnection.java:755) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:174) >> ... 10 more >> >> Funny thing is, the data import works on my local machine. I moved all >> the >> config files to another server, and I get this. I reindexed on my >> local >> machine immediately after in order to verify that the DB works, and it >> indexes fine. >> >> Here is my data-config file, just in case: >> >> >> > user="xxx" password="xxx"/> >> >> > transformer="ClobTransformer"> >> >> > clob="true" /> >> >> >> > query="select ID_A from ARTICLE_AUTHOR >> where ID_A='${ARTICLE.ID}'"> >> > name="author" /> >> >> >> >> >> >> >> I am using the 1.3 release version, with the 1.4 DIH jar file for the >> Clob >> Transform
Re: Indexing from DB connection issue
Would I need to rename it or refer to it somewhere? Or can I keep the existing name (apache-solr-dataimporthandler-1.4-dev.jar)? Cheers Noble Paul നോബിള് नोब्ळ्-2 wrote: > > take the trunk dih.jar. use winzip/winrar or any tool and just delete > all the files other than ClobTransformer.class. put that jar into > solr.home/lib > > On Wed, May 27, 2009 at 6:10 PM, ahammad wrote: >> >> Hmmm, that's probably a good idea...although it does not explain how my >> current local setup works. >> >> Can you please explain how this is done? I am assuming that I need to add >> the class itself to the source of solr 1.3, and then compile the code, >> and >> take the new .war file and put it in Tomcat? If that is correct, where in >> the source folders would the ClobTransformer.class file go? >> >> Thanks. >> >> >> >> Noble Paul നോബിള് नोब्ळ्-2 wrote: >>> >>> I guess it is better to copy the ClobTransformer.class alone and use >>> the old Solr1.3 DIH >>> >>> >>> >>> >>> >>> On Tue, May 26, 2009 at 11:50 PM, ahammad >>> wrote: I have an update: I played around with it some more and it seems like it's being caused by the ClobTransformer. If I remove the 'clob="true"' from the field part in the data-config, it works fine. The Solr install is a multicore one. I placed the apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in the {solrHome}/core1/lib directory (I only need it for the first core). Is there something else I need to do for it to work? I don't recall doing an additional step when I did this a few weeks ago on my local machine. Any help is appreciated. Regards ahammad wrote: > > Hello all, > > I am tyring to index directly from an Oracle DB. This is what appears > in > the stack trace: > > SEVERE: Full Import failed > org.apache.solr.handler.dataimport.DataImportHandlerException: Unable > to > execute query: select * from ARTICLE Processing Document # 1 > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) > at > org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) > at > org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) > Caused by: java.sql.SQLException: Closed Connection > at > oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) > at > oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) > at > oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) > at > oracle.jdbc.driver.PhysicalConnection.createStatement(PhysicalConnection.java:755) > at > org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:174) > ... 10 more > > Funny thing is, the data import works on my local machine. I moved all > the > config files to another server, and I get this. I reindexed on my > local > machine immediately after in order to verify that the DB works, and it > indexes fine. > > Here is my data-config file, just in case: > > > user="xxx" password="xxx"/> > > transformer="ClobTransformer"> > > clob="true" /> > > > query="select ID_A from ARTICLE_AUTHOR > where ID_A='${ARTICLE.ID}'"> > name="author" /> > > > > > > > I am using the 1.3 release version, with the 1.4 DIH jar file for the > Clob > Transformer. What could be causing this? > > Cheers > -- View this message in context: http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23728596.html Sent from the Solr - U
1.4 Replication
Does replication in 1.4 support passing credentials/basic auth? If not what is the best option to protect replication?
Re: Indexing from DB connection issue
take the trunk dih.jar. use winzip/winrar or any tool and just delete all the files other than ClobTransformer.class. put that jar into solr.home/lib On Wed, May 27, 2009 at 6:10 PM, ahammad wrote: > > Hmmm, that's probably a good idea...although it does not explain how my > current local setup works. > > Can you please explain how this is done? I am assuming that I need to add > the class itself to the source of solr 1.3, and then compile the code, and > take the new .war file and put it in Tomcat? If that is correct, where in > the source folders would the ClobTransformer.class file go? > > Thanks. > > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> I guess it is better to copy the ClobTransformer.class alone and use >> the old Solr1.3 DIH >> >> >> >> >> >> On Tue, May 26, 2009 at 11:50 PM, ahammad wrote: >>> >>> I have an update: >>> >>> I played around with it some more and it seems like it's being caused by >>> the >>> ClobTransformer. If I remove the 'clob="true"' from the field part in the >>> data-config, it works fine. >>> >>> The Solr install is a multicore one. I placed the >>> apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in the >>> {solrHome}/core1/lib directory (I only need it for the first core). Is >>> there >>> something else I need to do for it to work? >>> >>> I don't recall doing an additional step when I did this a few weeks ago >>> on >>> my local machine. >>> >>> Any help is appreciated. >>> >>> Regards >>> >>> >>> ahammad wrote: Hello all, I am tyring to index directly from an Oracle DB. This is what appears in the stack trace: SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: select * from ARTICLE Processing Document # 1 at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) at org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) at org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) at org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Caused by: java.sql.SQLException: Closed Connection at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) at oracle.jdbc.driver.PhysicalConnection.createStatement(PhysicalConnection.java:755) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:174) ... 10 more Funny thing is, the data import works on my local machine. I moved all the config files to another server, and I get this. I reindexed on my local machine immediately after in order to verify that the DB works, and it indexes fine. Here is my data-config file, just in case: >>> user="xxx" password="xxx"/> >>> transformer="ClobTransformer"> >>> clob="true" /> >>> query="select ID_A from ARTICLE_AUTHOR where ID_A='${ARTICLE.ID}'"> >>> name="author" /> I am using the 1.3 release version, with the 1.4 DIH jar file for the Clob Transformer. What could be causing this? Cheers >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23728596.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> >> > > -- > View this message in context: > http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23741712.html > Sent from the Solr - User mailing list archive at Nab
Re: Solr distributed, questions about upgrading
I agree. It is always a good idea to start with the example config/schema in the version that you are upgrading to and work you specific settings back into them. Newer versions of Solr will probably have new or changed settings. Even though sometime the config/shema is backward compatible, I think it is always better to move ahead. Bill On Tue, May 26, 2009 at 5:09 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > Valdir, > > Yes, although I can't give you exact pointers about what changed since 1.1 > (2006?), so my suggestion is that you take the example solrconfig/schema and > inject your settings in it. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: Valdir Salgueiro > > To: solr-user@lucene.apache.org > > Sent: Tuesday, May 26, 2009 3:24:42 PM > > Subject: Solr distributed, questions about upgrading > > > > hi, i have an application, and im wanting to port from solr 1.1 to 1.3 so > i > > can have distribution (various shards), do i have to change something on > > solrconfig.xml or schema.xml? > >
Re: Index replication without HTTP
If you are running on Unix/Linux, you should be able to use the scripts-based replication with some minor modification. You will need to change the scripts where it try to use HTTP to trigger a commit in Solr. Bill On Wed, May 27, 2009 at 5:36 AM, Ashish P wrote: > > Hi, > I have two instances of embedded server (no http) running on a network with > two separate indexes.. > I want to replicate changes from one index to other. > Is there any way?? > Thanks, > Ashish > -- > View this message in context: > http://www.nabble.com/Index-replication-without-HTTP-tp23739156p23739156.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Indexing from DB connection issue
Hmmm, that's probably a good idea...although it does not explain how my current local setup works. Can you please explain how this is done? I am assuming that I need to add the class itself to the source of solr 1.3, and then compile the code, and take the new .war file and put it in Tomcat? If that is correct, where in the source folders would the ClobTransformer.class file go? Thanks. Noble Paul നോബിള് नोब्ळ्-2 wrote: > > I guess it is better to copy the ClobTransformer.class alone and use > the old Solr1.3 DIH > > > > > > On Tue, May 26, 2009 at 11:50 PM, ahammad wrote: >> >> I have an update: >> >> I played around with it some more and it seems like it's being caused by >> the >> ClobTransformer. If I remove the 'clob="true"' from the field part in the >> data-config, it works fine. >> >> The Solr install is a multicore one. I placed the >> apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in the >> {solrHome}/core1/lib directory (I only need it for the first core). Is >> there >> something else I need to do for it to work? >> >> I don't recall doing an additional step when I did this a few weeks ago >> on >> my local machine. >> >> Any help is appreciated. >> >> Regards >> >> >> ahammad wrote: >>> >>> Hello all, >>> >>> I am tyring to index directly from an Oracle DB. This is what appears in >>> the stack trace: >>> >>> SEVERE: Full Import failed >>> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to >>> execute query: select * from ARTICLE Processing Document # 1 >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) >>> at >>> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >>> at >>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>> at >>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>> at >>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>> at >>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>> Caused by: java.sql.SQLException: Closed Connection >>> at >>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) >>> at >>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) >>> at >>> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) >>> at >>> oracle.jdbc.driver.PhysicalConnection.createStatement(PhysicalConnection.java:755) >>> at >>> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:174) >>> ... 10 more >>> >>> Funny thing is, the data import works on my local machine. I moved all >>> the >>> config files to another server, and I get this. I reindexed on my local >>> machine immediately after in order to verify that the DB works, and it >>> indexes fine. >>> >>> Here is my data-config file, just in case: >>> >>> >>> >> user="xxx" password="xxx"/> >>> >>> >> transformer="ClobTransformer"> >>> >>> >> clob="true" /> >>> >>> >>> >> query="select ID_A from ARTICLE_AUTHOR >>> where ID_A='${ARTICLE.ID}'"> >>> >> name="author" /> >>> >>> >>> >>> >>> >>> >>> I am using the 1.3 release version, with the 1.4 DIH jar file for the >>> Clob >>> Transformer. What could be causing this? >>> >>> Cheers >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23728596.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23741712.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: creating new fields at index time - is it possible?
On Wed, May 27, 2009 at 5:41 PM, Kir4 wrote: > > Hi guys!! > I have started studying Solr, and I was wondering if anyone would be so kind > as to help me understand a couple of things. > > I have XML data that looks like this: > > ID-0 > > Cheap hotels in Paris? > > > > I want to search the data based on a location hierarchy, so I must modify > the XML data to obtain something like this: > > > ID-0 > Paris > France > Europe > > Cheap hotels in Paris? > > > > I intend to create a plugin to do this using Geonames data (I will use > Geonames hierarchy and resource codes, not "Paris", "city", etc...) at index > time. I will have to parse the text in the "post" field, check for each word > a database (or XML files) to see if they are names of places, obtain the > elements of the hierarchy, and place them in each field. > > My questions: > 1) am I forcing Solr to do things it wasn't made for? would it be better to > process the XML data BEFORE feeding it to Solr? I guess this is best done outside of Solr > 2) could you suggest a function I could modify? I was considering rewriting > copyField; If you wish to plugin your code try this http://wiki.apache.org/solr/UpdateRequestProcessor > 3) if I rewrite an analyzer to create these fields, is there any way to > display the output? (the tokens) > > Thanks in advance!!! > Kir4. =) > -- > View this message in context: > http://www.nabble.com/creating-new-fields-at-index-time---is-it-possible--tp23741267p23741267.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
creating new fields at index time - is it possible?
Hi guys!! I have started studying Solr, and I was wondering if anyone would be so kind as to help me understand a couple of things. I have XML data that looks like this: ID-0 Cheap hotels in Paris? I want to search the data based on a location hierarchy, so I must modify the XML data to obtain something like this: ID-0 Paris France Europe Cheap hotels in Paris? I intend to create a plugin to do this using Geonames data (I will use Geonames hierarchy and resource codes, not "Paris", "city", etc...) at index time. I will have to parse the text in the "post" field, check for each word a database (or XML files) to see if they are names of places, obtain the elements of the hierarchy, and place them in each field. My questions: 1) am I forcing Solr to do things it wasn't made for? would it be better to process the XML data BEFORE feeding it to Solr? 2) could you suggest a function I could modify? I was considering rewriting copyField; 3) if I rewrite an analyzer to create these fields, is there any way to display the output? (the tokens) Thanks in advance!!! Kir4. =) -- View this message in context: http://www.nabble.com/creating-new-fields-at-index-time---is-it-possible--tp23741267p23741267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [Solr Wiki] Update of "FrontPage" by OscarBernal
I would just move it to the blog page. On May 26, 2009, at 9:17 PM, Erik Hatcher wrote: Oscar - are you on either of these lists? The front page does not seem like an appropriate place to list a blog entry. And the technique used in that blog entry doesn't seem like a best practice to me anyway, though I'd be curious to hear it debated a bit on solr- user on ways to offer suggest capabilities. Erik On May 26, 2009, at 9:05 PM, Apache Wiki wrote: Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification. The following page has been changed by OscarBernal: http://wiki.apache.org/solr/FrontPage The comment on the change is: Added Search Suggest functionality link -- * SolrPlugins * SolrRelevancyCookbook * LargeIndexes - Covers how to design and operate a very large Solr index. + * [http://oshyn.com/_bpost_1906/Implementing_Search_Suggest_with_Apache_Solr_(Part1) Search Suggest Functionality] + === Solr Clients === * IntegratingSolr -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: [Solr Wiki] Update of "FrontPage" by OscarBernal
>Oscar - are you on either of these lists? The front page does not >seem like an appropriate place to list a blog entry. And the >technique used in that blog entry doesn't seem like a best practice to >me anyway, though I'd be curious to hear it debated a bit on solr-user >on ways to offer suggest capabilities. > > Erik I do not think it should be on the front page. You already have the "SolrResources" page which seems the perfect place for this type of content. However I am not sure if it should go under the "blogs" or "articles" section of the "SolrResources" page. IMHO the blogs section should be for blogs which are totally or substantially about solr or lucene. A one off blog entry which focuses on solr should, I think, be considered an "article". However the more folk blogging about solr the better! Fergus. > > >On May 26, 2009, at 9:05 PM, Apache Wiki wrote: > >> Dear Wiki user, >> >> You have subscribed to a wiki page or wiki category on "Solr Wiki" >> for change notification. >> >> The following page has been changed by OscarBernal: >> http://wiki.apache.org/solr/FrontPage >> >> The comment on the change is: >> Added Search Suggest functionality link >> >> -- >> * SolrPlugins >> * SolrRelevancyCookbook >> * LargeIndexes - Covers how to design and operate a very large >> Solr index. >> + * >> [http://oshyn.com/_bpost_1906/Implementing_Search_Suggest_with_Apache_Solr_(Part1) >> >> Search Suggest Functionality] >> + >> >> === Solr Clients === >> * IntegratingSolr -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: Recover crashed solr index
Hmm... so in fact it looks like Solr had done a number of commits already (especially, given how large your generation is -- the "cje" in segments_cje means there were a number of commits). Were there any other exceptions leading up to this? Disk full? Anything unusual in your Solr configuration? Is there any chance that a 2nd Solr core attempted to access this same directory? Mike On Tue, May 26, 2009 at 9:33 PM, Wang Guangchen wrote: > Hi Mike, > > The index is autoCommit every 1000 docs. I set this to increase the indexing > speed. What is the best configuration do you suggest for each commit cycle? > > Thank you very much for your help. > > Following is the original exception: > > java.lang.RuntimeException: java.io.FileNotFoundException: > /solr/example/data/index/segments_cje (No such file or directory) > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1085) > at org.apache.solr.core.SolrCore.(SolrCore.java:561) > at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:121) > > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69) > at > org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) > at > org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) > > at > org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108) > > at > org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3709) > at > org.apache.catalina.core.StandardContext.start(StandardContext.java:4363) > at > org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) > > at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) > at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) > at > org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:627) > > at > org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) > at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:488) > at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1149) > at > org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:311) > at > org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117) > > at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) > at org.apache.catalina.core.StandardHost.start(StandardHost.java:719) > at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) > at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) > at org.apache.catalina.core.StandardService.start(StandardService.java:516) > at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) > at org.apache.catalina.startup.Catalina.start(Catalina.java:578) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288) > at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413) Caused by: > java.io.FileNotFoundException: > /mnt_APS/solr/solrHomeFull/data/index/segments_cje (No such file or > directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.(RandomAccessFile.java:212) > at > org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.(FSDirectory.java:630) > at > org.apache.lucene.store.FSDirectory$FSIndexInput.(FSDirectory.java:660) > > at > org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:76) > > at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:63) > at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:560) > at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:224) > at > org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:103) > > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:688) > > at > org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:123) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:237) > at org.apache.solr.search.SolrIndexSearcher. > > > > On Wed, May 27, 2009 at 2:32 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> It sounds like you had never committed while building the original >> index? Unfortunately, it's not easy to recover an index in that >> state. It's best to periodically commit if you're building such a >> large index. >> >> Do you have the original exception you hit? >> >> I'll fix CheckIndex to be more sane if it could not load any segments file. >> >> Mike >> >> On Tue, May 26, 2009 at 2:12 AM, Wang Guangchen >> wrote:
Re: Index replication without HTTP
On Wed, May 27, 2009 at 3:06 PM, Ashish P wrote: > > Hi, > I have two instances of embedded server (no http) running on a network with > two separate indexes.. > I want to replicate changes from one index to other. > Is there any way?? > EmbeddedSolrServer is meant for small scale usage -- like embedding in an application with no administrative cost etc.. Unless you yourself write some code to expose things on http, you lose the benefits of the http-based replication and distributed search. The best way for you is to setup solr on http on a container like tomcat or jetty. -- Regards, Shalin Shekhar Mangar.
Index replication without HTTP
Hi, I have two instances of embedded server (no http) running on a network with two separate indexes.. I want to replicate changes from one index to other. Is there any way?? Thanks, Ashish -- View this message in context: http://www.nabble.com/Index-replication-without-HTTP-tp23739156p23739156.html Sent from the Solr - User mailing list archive at Nabble.com.
using field search in morelikethis
Hi, When I'm doing normal search I'm using q=test +field:somevalue&otherparams... How can I implement this using morelikethis by posting the text and q is empty? I tried to use fq= but this is not what I want. Thanks, Renz
Re: How to deal with hyphens in PDF documents?
Hi, My solution was to this problem in Lucene, that I modified the Lucene's parser. There was a file in Lucene not in Java (StandardTokenizer.jj), which defines what is a token, and the types of tokens. My rule was, that a soft or hard hypen at the end of the line denote a word which continues in the beginning of the next line. I used iText instead of PDFBox, because PDFBox was ignoring hypens at the end of the line. It was years before. Now the file called StandardTokenizerImpl.jflex. I don't know how to solve it in Solr, because it incorporates severeal Lucene jars, and not clear for me how to hack only one jar. Király Péter http://extensiblecatalog.org http://tesuji.eu - Original Message - From: "Bauke Jan Douma" To: Sent: Wednesday, May 27, 2009 1:55 AM Subject: Re: How to deal with hyphens in PDF documents? Otis Gospodnetic wrote on 05/26/2009 11:06 PM: Hello, You really want to fix this before indexing, so you don't index garbage. One way to fix this is to make use of dictionaries while looking at two tokens at a time (current + next). Then you might see that neither "fo" or "cus" are in the dictionary, but that "focus" is, so you might concatenate the tokens and output just one "focus" token. You'd do something similar with "fo-" and "cus". Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Bauke Jan Douma To: solr-user@lucene.apache.org Sent: Tuesday, May 26, 2009 4:42:39 PM Subject: How to deal with hyphens in PDF documents? Good day, fellow solr users, Fair warning: - I am utterly new to solr and to this mailing list (and to lucene for that matter). I have been playing with solr for about two weeks. Goal: - I would like to index several thousand OCR'd newspaper articles, stored as PDF documents. I have been also been fiddling with PDFBox (tika), and with pdftotext in that regard. Ultimately, I would like to present search results having a URL to the original PDF, which when clicked, opens up the PDF with the search terms highlighted. Problem: hyphens (using PDFBox): Said newspaper articles are in Dutch. Now that language has the peculiarity that hyphenated words at EOL are a very common occurrence. The OCR'ed PDF's contain both soft and hard hyphens. Let's take the word 'focus' for example (focus in English), which is hyphenated as 'fo - cus', neither part of which are Dutch words by the way. Currently, in the XML search-results, using tika PDFBox, this can occur as: fo- cus (when the original PDF has a hard hyphen here, U+002D) fo cus (when the original PDF has a soft hyphen here, U+00AD) The problem is that neither of these would be found with a search term of 'focus'. I'v been googling for this for the past few days, but haven't seen this issue addressed anywhere. I must be overlooking something very obvious. Alternative? (using pdftotext): --- I was thinking of an alternative: using pdftotext to extract the content, run it through some custom filter to unhyphenate hyphenated words, and index these separately, besides the indexed original text. That way a search for those terms would yield results. With my limited knowledge and experience with solr however, presently I see that as shifting the same problem more or less, namely to where I want to present a clickable URL into the original PDF, with a search-string obtained from the solr search results (to highlight the term in the PDF). Any thoughts or pointers would be appreciated. Thanks all in advance for your time. Regards, Bauke Jan Douma Hello Otis, Understood. But wouldn't that lead to the problem that, when using the search result (taking it from the highlighting result in solr -- forgot to mention), that fragment will not be found in the PDF, since the PDF contains the hyphenated word? Oops. Just now I discovered that searching multiple-word strings that cross multiple lines in a PDF doesn't even work to begin with, even when there are no hyphens (evince on Ubuntu -- don't know if that works in Adobe Acrobat). That looks like an unsolved problem. Thank you for your input. Bauke Jan
Re: Indexing from DB connection issue
I guess it is better to copy the ClobTransformer.class alone and use the old Solr1.3 DIH On Tue, May 26, 2009 at 11:50 PM, ahammad wrote: > > I have an update: > > I played around with it some more and it seems like it's being caused by the > ClobTransformer. If I remove the 'clob="true"' from the field part in the > data-config, it works fine. > > The Solr install is a multicore one. I placed the > apache-solr-dataimporthandler-1.4-dev.jar from the nightly builds in the > {solrHome}/core1/lib directory (I only need it for the first core). Is there > something else I need to do for it to work? > > I don't recall doing an additional step when I did this a few weeks ago on > my local machine. > > Any help is appreciated. > > Regards > > > ahammad wrote: >> >> Hello all, >> >> I am tyring to index directly from an Oracle DB. This is what appears in >> the stack trace: >> >> SEVERE: Full Import failed >> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to >> execute query: select * from ARTICLE Processing Document # 1 >> at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:186) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:143) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:43) >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:285) >> at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >> at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >> at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >> Caused by: java.sql.SQLException: Closed Connection >> at >> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112) >> at >> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146) >> at >> oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:208) >> at >> oracle.jdbc.driver.PhysicalConnection.createStatement(PhysicalConnection.java:755) >> at >> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:174) >> ... 10 more >> >> Funny thing is, the data import works on my local machine. I moved all the >> config files to another server, and I get this. I reindexed on my local >> machine immediately after in order to verify that the DB works, and it >> indexes fine. >> >> Here is my data-config file, just in case: >> >> >> > user="xxx" password="xxx"/> >> >> > transformer="ClobTransformer"> >> >> >> >> >> >> >> >> >> >> >> >> >> I am using the 1.3 release version, with the 1.4 DIH jar file for the Clob >> Transformer. What could be causing this? >> >> Cheers >> > > -- > View this message in context: > http://www.nabble.com/Indexing-from-DB-connection-issue-tp23725712p23728596.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com