Random Display of result in solr
Hi to all, I have an issue while i am working with solr. I am working on blog module,here the user will be creating more blogs,and he can post on it and have several comments for post.For implementing this module i am using solr 1.4. When i get blog details of particular user, it brings the result in random manner for (ex:) If i am passing blogid in the query to get my details,the result i got as ,if i have 2 result from it This is the first result SolrDocument1{blogTitle=New Blog, blogId=New Blog, userId=1}] SolrDocument2{blogId=New Blog, postId=New Post, postTitle=New Post, postMessage=New Post Message, timestamp_post=Fri Sep 11 09:48:24 IST 2009}] SolrDocument3{blogTitle=ammu blog, blogId=ammu blog, userId=1}] The Second result SolrDocument1{blogTitle=New Blog, blogId=New Blog, userId=1}] SolrDocument2{blogTitle=ammu blog, blogId=ammu blog, userId=1}] SolrDocument3{blogId=New Blog, postId=New Post, postTitle=New Post, postMessage=New Post Message, timestamp_post=Fri Sep 11 09:48:24 IST 2009}] I am using solrj, when i am iterating the list i some times get ArrayIndexOutOfBoundException,because of my difference in the result. When i run again my code some other time ,it produces the proper result.so the list was changing all time. If anybody faced this type of problem ,please share with me.. And iam not able to get the specific thing ie if i am going to get blog details of particular user, so i will be passing blogtitle for ex: rekha blog , it is not giving only the rekha blog it also gives other blog which ends with blog (i..e sandhya blog,it brings even that and shows..). what should i do for this ,any specific query should be given ,i am using solrj.. using this how to make my query to get my prompt result. Waiting for your reply Regards, Rekha. -- View this message in context: http://www.nabble.com/Random-Display-of-result-in-solr-tp25395746p25395746.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Backups using Replication
ok this was committed on July 15, 2009 before that backupAfter was called "snapshot" On Thu, Sep 10, 2009 at 10:14 PM, wojtekpia wrote: > > I'm using trunk from July 8, 2009. Do you know if it's more recent than that? > > > Noble Paul നോബിള് नोब्ळ्-2 wrote: >> >> which version of Solr are you using? the "backupAfter" name was >> introduced recently >> > > -- > View this message in context: > http://www.nabble.com/Backups-using-Replication-tp25350083p25386886.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
What Tokenizerfactory/TokenFilterFactory can/should I use so a search for "wal mart" matches "walmart"(quotes not included in search or index)?
There are a lot of company names that people are uncertain as to the correct spelling. A few of examples are: 1. best buy, bestbuy 2. walmart, wal mart, wal-mart 3. Holiday Inn, HolidayInn What Tokenizer Factory and/or TokenFilterFactory should I use so that somebody typing "wal mart"(quotes not included) will find "wal mart" and "walmart"(again, quotes not included) Thanks, Christian
Re: Default Query Type For Facet Queries
Changing basic defaults like this makes it very confusing to work with successive solr releases, to read the wiki, etc. You can make custom search requesthandlers - an example: customparser http://localhost:8983/solr/custom?q=string_in_my_custom_language On 9/10/09, Stephen Duncan Jr wrote: > If using {!type=customparser} is the only way now, should I file an issue to > make the default configurable? > > -- > Stephen Duncan Jr > www.stephenduncanjr.com > > On Thu, Sep 3, 2009 at 11:23 AM, Stephen Duncan Jr > wrote: > > > We have a custom query parser plugin registered as the default for > > searches, and we'd like to have the same parser used for facet.query. > > > > Is there a way to register it as the default for FacetComponent in > > solrconfig.xml? > > > > I know I can add {!type=customparser} to each query as a workaround, but > > I'd rather register it in the config that make my code send that and strip > > it off on every facet query. > > > > -- > > Stephen Duncan Jr > > www.stephenduncanjr.com > > > -- Lance Norskog goks...@gmail.com
Re: Extract info from parent node during data import
On Fri, Sep 11, 2009 at 6:48 AM, venn hardy wrote: > > Hi Fergus, > > When I debugged in the development console > http://localhost:9080/solr/admin/dataimport.jsp?handler=/dataimport > > I had no problems. Each category/item seems to be only indexed once, and no > parent fields are available (except the category name). > > I am not entirely sure how the forEach statement works, but my interpretation > of forEach="/document/category/item | /document/category" is something like > this: > > 1. Whenever DIH encounters a document/category it will extract the > /document/category/ > > name field as a common field > 2. Whenever DIH encounters a document/category/item it will extract all of > the item fields. > 3. When all fields have been encountered, save the document in solr and go to > the next category/item /document/category/item | /document/category means there are two paths which triggers a new doc (it is possible to have more). Whenever it encounters the closing tag of that xpath , it emits all the fields it collected since the opening of the same tag. after that it clears all the fields it collected since the opening of the tag. If there are fields it collected before opening of the same tag, it retains it > > >> Date: Thu, 10 Sep 2009 14:19:31 +0100 >> To: solr-user@lucene.apache.org >> From: fer...@twig.me.uk >> Subject: RE: Extract info from parent node during data import >> >> >Hi Paul, >> >The forEach="/document/category/item | /document/category/name" didn't work >> >(no categoryname was stored or indexed). >> >However forEach="/document/category/item | /document/category" seems to >> >work well. I am not sure why category on its own works, but not >> >category/name... >> >But thanks for tip. It wasn't as painful as I thought it would be. >> >Venn >> >> Hmmm, I had bother with this. Although each occurance of >> /document/category/item >> causes a new solr document to indexed, that document contained all the >> fields from >> the parent element as well. >> >> Did you see this? >> >> > >> >> From: noble.p...@corp.aol.com >> >> Date: Thu, 10 Sep 2009 09:58:21 +0530 >> >> Subject: Re: Extract info from parent node during data import >> >> To: solr-user@lucene.apache.org >> >> >> >> try this >> >> >> >> add two xpaths in your forEach >> >> >> >> forEach="/document/category/item | /document/category/name" >> >> >> >> and add a field as follows >> >> >> >> > >> commonField="true"/> >> >> >> >> Please try it out and let me know. >> >> >> >> On Thu, Sep 10, 2009 at 7:30 AM, venn hardy >> >> wrote: >> >> > >> >> > Hello, >> >> > >> >> > >> >> > >> >> > I am using SOLR 1.4 (from nighly build) and its URLDataSource in >> >> > conjunction with the XPathEntityProcessor. I have successfully imported >> >> > XML content, but I think I may have found a limitation when it comes to >> >> > the commonField attribute in the DataImportHandler. >> >> > >> >> > >> >> > >> >> > Before writing my own parser to read in a whole XML document, I thought >> >> > I'd post the question here (since I got some great advice last time). >> >> > >> >> > >> >> > >> >> > The bulk of my content is contained within each tag. However, >> >> > each item has a parent called and each category has a name >> >> > which I would like to import. In my forEach loop I specify the >> >> > /document/category/item as the collection of items I am interested in. >> >> > Is there anyway to extract an element from underneath a parent node? To >> >> > be a more more specific (see eg xml below). I would like to index the >> >> > following: >> >> > >> >> > - category: Category 1; id: 1; author: Author 1 >> >> > >> >> > - category: Category 1; id: 2; author: Author 2 >> >> > >> >> > - category: Category 2; id: 3; author: Author 3 >> >> > >> >> > - category: Category 2; id: 4; author: Author 4 >> >> > >> >> > >> >> > >> >> > Any ideas on how I can get to a parent node from within a child during >> >> > data import? If it cant be done, what do you suggest would be the best >> >> > way so I can keep using the DataImportHandler... would XSLT be a good >> >> > idea to 'flatten out' the structure a bit? >> >> > >> >> > >> >> > >> >> > Thanks >> >> > >> >> > >> >> > >> >> > This is what my XML document looks like: >> >> > >> >> > >> >> > >> >> > Category 1 >> >> > >> >> > 1 >> >> > Author 1 >> >> > >> >> > >> >> > 2 >> >> > Author 2 >> >> > >> >> > >> >> > >> >> > Category 2 >> >> > >> >> > 3 >> >> > Author 3 >> >> > >> >> > >> >> > 4 >> >> > Author 4 >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > And this is what my dataConfig looks like: >> >> > >> >> > >> >> > >> >> > > >> > url="http://localhost:9080/data/20090817070752.xml"; >> >> > processor="XPathEntityProcessor" forEach="/document/category/item" >> >> > transformer="DateFormatTransformer" stream="true" >> >> > dataSource="dataSource"> >> >> > > >> > commonField="true" /> >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >>
Re: Facet fields and the DisMax query handler
Facets are not involved here. These are only simple searches. The DisMax parser does not use field names in the query. DisMax creates a nice simple syntax for people to type into a web browser search field. The various parameters let you sculpt the relevance in order to tune the user experience. There are ways to intermix dismax parsing in the standard query parser syntax, but I am no expert. You can also use these field queries as filter queries; this is a hack but does work. Also, using wildcards interferes with upper/lower case handling. On 9/10/09, Villemos, Gert wrote: > > I'm trying to understand the DisMax query handler. I orginally > configured it to ensure that the query was mapped onto different fields > in the documents and a boost assigned if the fields match. And that > works pretty smoothly. > > However when it comes to facetted searches the results perplexes me. > Consider the following example; > > Document A: >John Doe > > Document B: >John Doe > > The following queries does not return anything; >Staff:Doe >Staff:Doe* >Staff:John >Staff:John* > > The query; >Staff:"John" > > Returns Document A and B, even though document B doesnt even contain the > field 'Staff' (which is optional)! Through the "qf" field dismax has > been configured to search over the field 'ProjectManager' but I expected > the usage of a facet value would exclude the field... Looking at the > score of the documents, document A does score much higher than Document > B (a factor 20) but I would expect not to see B at all. I have changed > the dismax configuration minimum match to be 1, to ensure that all hits > with a single match is returned without effect. I have changed the tie > to 0 with no effect. > > What am I missing here? I would like queries such as 'Staff:Doe' to > return document A, and only A. > > Cheers, > Gert. > > > > Please help Logica to respect the environment by not printing this email / > Pour contribuer comme Logica au respect de l'environnement, merci de ne pas > imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen > Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a > respeitar o ambiente nao imprimindo este correio electronico. > > > > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be copied, > disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender. Thank you. > > -- Lance Norskog goks...@gmail.com
Re: Re : Indexing fields dynamically
In the schema.xml file, "*_i" is defined as a wildcard type for integer. If a name-value pair is an integer, use: name_i as the field name. On 9/10/09, nourredine khadri wrote: > > Thanks for the quick reply. > > Ok for dynamicFields but how can i rename fields during indexation/search > to add suffix corresponding to the type ? > > What is the best way to do this? > > Nourredine. > > > > > > De : Yonik Seeley > À : solr-user@lucene.apache.org > Envoyé le : Jeudi, 10 Septembre 2009, 14h24mn 26s > Objet : Re: Indexing fields dynamically > > On Thu, Sep 10, 2009 at 5:58 AM, nourredine khadri > wrote: > > I want to index my fields dynamically. > > > > DynamicFields don't suit my need because I don't know fields name in > advance and fields type must be set > dynamically too (need strong typage). > > This is what dynamic fields are meant for - you pick both the name and > type (from a pre-defined set of types of course) at runtime. The > suffix of the field name matches one of the dynamic fields and > essentially picks the type. > > -Yonik > http://www.lucidimagination.com > > > > -- Lance Norskog goks...@gmail.com
Re: An issue with using Solr Cell and multiple files
You are right. I got into same thing. Windows curl gave me error but cygwin ran without any issues. thanks Lance Norskog-2 wrote: > > It is a windows problem (or curl, whatever). This works with > double-quotes. > > C:\Users\work\Downloads>\cygwin\home\work\curl-7.19.4\curl.exe > http://localhost:8983/solr/update --data-binary "" -H > "Content-type:text/xml; charset=utf-8" > Single-quotes inside double-quotes should work: " waitFlush='false'/>" > > > On Tue, Sep 8, 2009 at 11:59 AM, caman > wrote: > >> >> seems to be an error with curl >> >> >> >> >> Kevin Miller-17 wrote: >> > >> > I am getting the same error message. I am running Solr on a Windows >> > machine. Is the commit command a curl command or is it a Solr command? >> > >> > >> > Kevin Miller >> > Web Services >> > >> > -Original Message- >> > From: Grant Ingersoll [mailto:gsing...@apache.org] >> > Sent: Tuesday, September 08, 2009 12:52 PM >> > To: solr-user@lucene.apache.org >> > Subject: Re: An issue with using Solr Cell and multiple files >> > >> > solr/examples/exampledocs/post.sh does: >> > curl $URL --data-binary '' -H 'Content-type:text/xml; >> > charset=utf-8' >> > >> > Not sure if that helps or how it compares to the book. >> > >> > On Sep 8, 2009, at 1:48 PM, Kevin Miller wrote: >> > >> >> I am using the Solr nightly build from 8/11/2009. I am able to index >> >> my documents using the Solr Cell but when I attempt to send the commit >> > >> >> command I get an error. I am using the example found in the Solr 1.4 >> >> Enterprise Search Server book (recently released) found on page 84. >> >> It >> >> shows to commit the changes as follows (I am showing where my files >> >> are located not the example in the book): >> >> >> c:\curl\bin\curl http://echo12:8983/solr/update/ -H "Content-Type: >> >> text/xml" --data-binary '' >> >> >> >> this give me this error: The system cannot find the file specified. >> >> >> >> I get the same error when I modify it to look like the following: >> >> >> c:\curl\bin\curl http://echo12:8983/solr/update/ '> >> waitFlush="false"/>' >> c:\curl\bin\curl "http://echo12:8983/solr/update/"; -H "Content-Type: >> >> text/xml" --data-binary '' >> c:\curl\bin\curl http://echo12:8983/solr/update/ '' >> c:\curl\bin\curl "http://echo12:8983/solr/update/"; '' >> >> >> >> I am using the example configuration in Solr so my documents are found >> > >> >> in the exampledocs folder also my curl program in located in the root >> >> directory which is the reason for the way the curl command is being >> >> executed. >> >> >> >> I would appreciate any information on where to look or how to get the >> >> commit command to execute after indexing multiple files. >> >> >> >> Kevin Miller >> >> Oklahoma Tax Commission >> >> Web Services >> > >> > -- >> > Grant Ingersoll >> > http://www.lucidimagination.com/ >> > >> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >> > using Solr/Lucene: >> > http://www.lucidimagination.com/search >> > >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25352122.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Lance Norskog > goks...@gmail.com > > -- View this message in context: http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25394203.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to Convert Lucene index files to XML Format
It is best to start off with Solr by playing around with the example in the example/ directory. Index the data in the example/exampledocs directory, do some searches, look at the index with the admin/luke page. After that, this will be much easier. To bring your Lucene under Solr, you have to examine the design of the Lucene index and create a matching Solr schema in solr/conf/schema.xml. On 9/10/09, busbus wrote: > > > Thanks for your reply > > > > > > > On Sep 10, 2009, at 6:41 AM, busbus wrote: > > Solr defers to Lucene on reading the index. You just need to tell > > Solr whether the index is a compound file or not and make sure the > > versions are compatible. > > > > This part seems to be the point. > How to make solr to read lucene index files. > There is a tag in Solrconfig.xml > false > > Enable it to true does not seem to be working. > > What else need to be done. > > Should i change the config file or add new tag. > > Also how to check the compatibility of Lucen and solr > > Thanks in advance > > -- > View this message in context: > http://www.nabble.com/How-to-Convert-Lucene-index-files-to-XML-Format-tp25381017p25382367.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Re: Query regarding incremental index replication
There is only one index. The index has newer "segments" which represent new records and deletes to old records (sort of). Incremental replication copies new segments; putting the new segments together with the previous index makes the new index. Incremental replication under rsync does work; perhaps it did not work for you. If you do not want to store the full index on the indexer, that is a problem. You will not be able to optimize the index on the indexer and ship the new index to the slaves. This has more on large-volume Solr installation design: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr On 9/9/09, Silent Surfer wrote: > > Hi , > > Currently we are using Solr 1.3 and we have the following requirement. > > As we need to process very high volumes of documents (of the order of 400 > GB per day), we are planning to separate indexer(s) and searcher(s), so that > there won't be performance hit. > > Our idea is to have have a set of servers which is used only for indexers > for index creation and then every 5 mins or so, the index will be copied to > the searchers(set of solr servers only for querying). For this we tried to > use the snapshooter,rsysnc etc. > > But the problem with this approach is, the same index is present on both > the indexer and searcher, and hence occupying large FS. > > What we need is a mechanism, where in the indexer contains only the index > for the past 5 mins(last indexing cycle before the snap shooter is run) and > the searcher should have the accumulated(total) index i.e every 5 mins, we > should be able to move the entire index from indexer to searcher and so on. > > The above scenario is slightly different from master/slave implementation, > as on master we want only the latest(WIP) index and the slave should contain > the entire index. > > Appreciate if anyone can throw some light on how to achieve this. > > Thanks, > sS > > > > > -- Lance Norskog goks...@gmail.com
Re: Very slow first query
Yes, but in this case the query that I'm executing doesn't have any facet. I mean for this query I'm not using any filter cache.What does it means "operating system cache can be significant"? That my first query uploads a big chunk on the index into memory (maybe even the entire index)? On Thu, Sep 10, 2009 at 10:07 PM, Yonik Seeley wrote: > At 12M documents, operating system cache can be significant. > Also, the first time you sort or facet on a field, a field cache > instance is populated which can take a lot of time. You can prevent > slow first queries by configuring a static warming query in > solrconfig.xml that includes the common sorts and facets. > > -Yonik > http://www.lucidimagination.com > > On Thu, Sep 10, 2009 at 8:55 PM, Jonathan Ariel > wrote: > > Hi!Why would it take for the first query that I execute almost 60 seconds > to > > run and after that no more than 50ms? I disabled all my caching to check > if > > it is the reason for the subsequent fast responses, but the same happens. > > I'm using solr 1.3. > > Something really strange is that it doesn't happen with all the queries. > It > > is happening with a query that filters some integer and string fields > joined > > by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:"CA") > (exact > > match). > > My index is around 1200M documents. > > > > Thanks, > > > > Jonathan > > >
Re: Very Urjent
Another, slower way is to create a spell checking dictionary and do spelling requests on the first few characters the user types. http://wiki.apache.org/solr/SpellCheckerRequestHandler?highlight=%28spell%29%7C%28checker%29 Another way is to search against facet values with the facet.prefix feature: http://wiki.apache.org/solr/SimpleFacetParameters?highlight=%28facet%29%7C%28%2A%29#head-021d583a1430f6485c6e929930fceec3e15e1e8a All of these have the same problem: programmers are all perfect spellers, while normal people are not. None of these techniques assist normal people to find homonyms. On 9/9/09, dharhsana wrote: > > > Hi Shalin Shekhar Mangar, > > I got some come from this site > > http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ > > When i use that code in my project ,then only i came to know that there is > no Termscomponent jar or plugin .. > > There is any other way for doing autocompletion search with out terms > component. > > If so please tell me how to implement it. > > waiting for your reply > > Regards, > > Rekha. > > > > -- > View this message in context: > http://www.nabble.com/Very-Urjent-tp25359244p25360892.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Re: Why dismax isn't the default with 1.4 and why it doesn't support fuzzy search ?
A QueryParser is a Lucene class that parses a string into a tree of query objects. A request handler in solrconfig.xml describes a Solr RequestHandler object. This object binds strings into http parameter strings. If a request handler name is "/abc" then it is called by http://localhot:8983/solr/abc but if there is no slash, the name "abc" is available when some other request handler is called. "Available" means that some other code can search for the name. In "qt=dismax", the code that searches for &qt knows that dismax is a requesthandler. (It all made sense when I started typing ...) On 9/9/09, Villemos, Gert wrote: > > Sorry for being a bit dim, I dont understand this; > > Looking at my default configuration for SOLR, I have a request handler > named 'dismax' and request handler named 'standard' with the default="true". > I understand that I can configure the usage of this in the query using the > qt=dismax or qt=standard (... Or no qt as standard is set to default). And > if I set the 'defType=dismax' flag in the standard requesthandler then I > will use the dismax queryparser per default. This far, so good. > > What I dont understand is whether a requesthandler and a queryparser is the > same thing, i.e. The configuration contains a REQUESTHANDLER with the name > 'dismax', but does not contain a QUERYPARSER with the name 'dismax'. Where > does the 'dismax' queryparser come from? Do I have to configure this extra? > Or is it there per default? Or does it come from the 'dismax' > requesthandler? > > Gert. > > > > > > > -Original Message- > From: kaoul@gmail.com [mailto:kaoul@gmail.com] On Behalf Of Erwin > Sent: Wednesday, September 09, 2009 10:55 AM > To: solr-user@lucene.apache.org > Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't > support fuzzy search ? > > Hi Gert, > > &qt=dismax in URL works with Solr 1.3 and 1.4 without further > configuration. You are right, you should find a "dismax" query parser in > solrconfig.xml by default. > > Erwin > > On Wed, Sep 9, 2009 at 7:49 AM, Villemos, Gert > wrote: > > On question to this; > > > > Do you need to explicitly configure a 'dismax' queryparser in the > > solrconfig.xml to enable this, or is a queryparser named 'dismax' > > available per default? > > > > Cheers, > > Gert. > > > > > > > > > > -Original Message- > > From: Chris Hostetter [mailto:hossman_luc...@fucit.org] > > Sent: Wednesday, September 02, 2009 2:44 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Why dismax isn't the default with 1.4 and why it doesn't > > support fuzzy search ? > > > > : The wiki says "As of Solr 1.3, the DisMaxRequestHandler is simply > > the > > : standard request handler with the default query parser set to the > > : DisMax Query Parser (defType=dismax).". I just made a checkout of > > svn > > : and dismax doesn't seems to be the default as : > > > > that paragraph doesn't say that dismax is the "default handler" ... it > > says that using qt=dismax is the same as using qt=standard with the " > > query parser" set to be the DisMaxQueryParser (using defType=dismax) > > > > > > so doing this replacement on any URL... > > > >qt=dismax => qt=standard&defTYpe=dismax > > > > ...should produce identical results. > > > > : Secondly, I've patched solr with > > : http://issues.apache.org/jira/browse/SOLR-629 as I would like to > > have > > : fuzzy with dismax. I built it with "ant example". Now, behavior is > > : still the same, no fuzzy search with dismax (using the qt=dismax > > : parameter in GET URL). > > > > questions/discussion of uncommitted patches is best done in the Jira > > issue wherey ou found the patch ... that way it helps other people > > evaluate the patch, and the author of the patch is more likelye to see > > your feedback. > > > > > > -Hoss > > > > > > > > Please help Logica to respect the environment by not printing this > email / Pour contribuer comme Logica au respect de l'environnement, merci > de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus > und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a > Logica a respeitar o ambiente nao imprimindo este correio electronico. > > > > > > > > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be copied, > disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender. Thank you. > > > > > > > > > Please help Logica to respect the environment by not printing this email / > Pour contribuer comme Logica au respect de l'environnement, merci de ne pas > imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen > Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a > respeitar o ambiente n
Re: Date Faceting and Double Counting
datefield:[X TO* Y] for X to Y-0....1 This would be backwards-compatible. {} are used for other things and lexing is a dying art. Using a * causes mistakes to trigger wildcard syntaxes, which will fail loudly. On Tue, Sep 8, 2009 at 5:20 PM, Chris Hostetter wrote: > > : I ran into that problem as well but the solution was provided to me by > : this very list :) See > : http://www.nabble.com/Range-queries-td24057317.html It's not the > : cleanest solution, but as long as you know what you're doing it's not > : that bad. > > Hmmm... yeah, that's a total hack. one of these days we really need to > fix the lucene query parser grammer so inclusive/exclusive can be > different for hte upper/lower bounds... > >datefield:[NOW/DAY TO NOW/DAY+1DAY} > > > -Hoss > > -- Lance Norskog goks...@gmail.com
RE: Extract info from parent node during data import
Hi Fergus, When I debugged in the development console http://localhost:9080/solr/admin/dataimport.jsp?handler=/dataimport I had no problems. Each category/item seems to be only indexed once, and no parent fields are available (except the category name). I am not entirely sure how the forEach statement works, but my interpretation of forEach="/document/category/item | /document/category" is something like this: 1. Whenever DIH encounters a document/category it will extract the /document/category/ name field as a common field 2. Whenever DIH encounters a document/category/item it will extract all of the item fields. 3. When all fields have been encountered, save the document in solr and go to the next category/item > Date: Thu, 10 Sep 2009 14:19:31 +0100 > To: solr-user@lucene.apache.org > From: fer...@twig.me.uk > Subject: RE: Extract info from parent node during data import > > >Hi Paul, > >The forEach="/document/category/item | /document/category/name" didn't work > >(no categoryname was stored or indexed). > >However forEach="/document/category/item | /document/category" seems to work > >well. I am not sure why category on its own works, but not category/name... > >But thanks for tip. It wasn't as painful as I thought it would be. > >Venn > > Hmmm, I had bother with this. Although each occurance of > /document/category/item > causes a new solr document to indexed, that document contained all the fields > from > the parent element as well. > > Did you see this? > > > > >> From: noble.p...@corp.aol.com > >> Date: Thu, 10 Sep 2009 09:58:21 +0530 > >> Subject: Re: Extract info from parent node during data import > >> To: solr-user@lucene.apache.org > >> > >> try this > >> > >> add two xpaths in your forEach > >> > >> forEach="/document/category/item | /document/category/name" > >> > >> and add a field as follows > >> > >> >> commonField="true"/> > >> > >> Please try it out and let me know. > >> > >> On Thu, Sep 10, 2009 at 7:30 AM, venn hardy wrote: > >> > > >> > Hello, > >> > > >> > > >> > > >> > I am using SOLR 1.4 (from nighly build) and its URLDataSource in > >> > conjunction with the XPathEntityProcessor. I have successfully imported > >> > XML content, but I think I may have found a limitation when it comes to > >> > the commonField attribute in the DataImportHandler. > >> > > >> > > >> > > >> > Before writing my own parser to read in a whole XML document, I thought > >> > I'd post the question here (since I got some great advice last time). > >> > > >> > > >> > > >> > The bulk of my content is contained within each tag. However, > >> > each item has a parent called and each category has a name > >> > which I would like to import. In my forEach loop I specify the > >> > /document/category/item as the collection of items I am interested in. > >> > Is there anyway to extract an element from underneath a parent node? To > >> > be a more more specific (see eg xml below). I would like to index the > >> > following: > >> > > >> > - category: Category 1; id: 1; author: Author 1 > >> > > >> > - category: Category 1; id: 2; author: Author 2 > >> > > >> > - category: Category 2; id: 3; author: Author 3 > >> > > >> > - category: Category 2; id: 4; author: Author 4 > >> > > >> > > >> > > >> > Any ideas on how I can get to a parent node from within a child during > >> > data import? If it cant be done, what do you suggest would be the best > >> > way so I can keep using the DataImportHandler... would XSLT be a good > >> > idea to 'flatten out' the structure a bit? > >> > > >> > > >> > > >> > Thanks > >> > > >> > > >> > > >> > This is what my XML document looks like: > >> > > >> > > >> > > >> > Category 1 > >> > > >> > 1 > >> > Author 1 > >> > > >> > > >> > 2 > >> > Author 2 > >> > > >> > > >> > > >> > Category 2 > >> > > >> > 3 > >> > Author 3 > >> > > >> > > >> > 4 > >> > Author 4 > >> > > >> > > >> > > >> > > >> > > >> > > >> > And this is what my dataConfig looks like: > >> > > >> > > >> > > >> > >> > url="http://localhost:9080/data/20090817070752.xml"; > >> > processor="XPathEntityProcessor" forEach="/document/category/item" > >> > transformer="DateFormatTransformer" stream="true" > >> > dataSource="dataSource"> > >> > >> > commonField="true" /> > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > This is how I have specified my schema > >> > > >> > >> > required="true" /> > >> > > >> > > >> > > >> > > >> > id > >> > id > >> > > >> > > >> > > >> > > >> > > >> > > >> > _ > >> > Need a place to rent, buy or share? Let us find your next place for you! > >> > http://clk.atdmt.com/NMN/go/157631292/direct/01/ > >> > >> > >> > >> -- > >> - > >> Noble Paul | Principal Engineer| AOL | http://aol.com > > > >_ > >Get Hotmail on your iPhone Find out how
Re: Dynamically building the value of a field upon indexing
This has to be done by an UpdateRequestProcessor http://wiki.apache.org/solr/UpdateRequestProcessor On Tue, Sep 8, 2009 at 3:34 PM, Villemos, Gert wrote: > I would like to build the value of a field based on the value of multiple > other fields at submission time. I.e. I would like to submit a document such > as; > > foo > baa > > And would like SOLR to store the document as > > foo > baa > foo:baa > > Just to complicate matters I would like the aggregated field to be the > unique key. > > Is this possible? > > Thanks, > Gert. > > > Please help Logica to respect the environment by not printing this email / > Pour contribuer comme Logica au respect de l'environnement, merci de ne pas > imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen > Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a > respeitar o ambiente nao imprimindo este correio electronico. > > > > This e-mail and any attachment is for authorised use by the intended > recipient(s) only. It may contain proprietary material, confidential > information and/or be subject to legal privilege. It should not be copied, > disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any > attachment and all copies and inform the sender. Thank you. > > -- Lance Norskog goks...@gmail.com
Re: Very slow first query
At 12M documents, operating system cache can be significant. Also, the first time you sort or facet on a field, a field cache instance is populated which can take a lot of time. You can prevent slow first queries by configuring a static warming query in solrconfig.xml that includes the common sorts and facets. -Yonik http://www.lucidimagination.com On Thu, Sep 10, 2009 at 8:55 PM, Jonathan Ariel wrote: > Hi!Why would it take for the first query that I execute almost 60 seconds to > run and after that no more than 50ms? I disabled all my caching to check if > it is the reason for the subsequent fast responses, but the same happens. > I'm using solr 1.3. > Something really strange is that it doesn't happen with all the queries. It > is happening with a query that filters some integer and string fields joined > by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:"CA") (exact > match). > My index is around 1200M documents. > > Thanks, > > Jonathan >
Re: An issue with using Solr Cell and multiple files
It is a windows problem (or curl, whatever). This works with double-quotes. C:\Users\work\Downloads>\cygwin\home\work\curl-7.19.4\curl.exe http://localhost:8983/solr/update --data-binary "" -H "Content-type:text/xml; charset=utf-8" Single-quotes inside double-quotes should work: "" On Tue, Sep 8, 2009 at 11:59 AM, caman wrote: > > seems to be an error with curl > > > > > Kevin Miller-17 wrote: > > > > I am getting the same error message. I am running Solr on a Windows > > machine. Is the commit command a curl command or is it a Solr command? > > > > > > Kevin Miller > > Web Services > > > > -Original Message- > > From: Grant Ingersoll [mailto:gsing...@apache.org] > > Sent: Tuesday, September 08, 2009 12:52 PM > > To: solr-user@lucene.apache.org > > Subject: Re: An issue with using Solr Cell and multiple files > > > > solr/examples/exampledocs/post.sh does: > > curl $URL --data-binary '' -H 'Content-type:text/xml; > > charset=utf-8' > > > > Not sure if that helps or how it compares to the book. > > > > On Sep 8, 2009, at 1:48 PM, Kevin Miller wrote: > > > >> I am using the Solr nightly build from 8/11/2009. I am able to index > >> my documents using the Solr Cell but when I attempt to send the commit > > > >> command I get an error. I am using the example found in the Solr 1.4 > >> Enterprise Search Server book (recently released) found on page 84. > >> It > >> shows to commit the changes as follows (I am showing where my files > >> are located not the example in the book): > >> > c:\curl\bin\curl http://echo12:8983/solr/update/ -H "Content-Type: > >> text/xml" --data-binary '' > >> > >> this give me this error: The system cannot find the file specified. > >> > >> I get the same error when I modify it to look like the following: > >> > c:\curl\bin\curl http://echo12:8983/solr/update/ ' >> waitFlush="false"/>' > c:\curl\bin\curl "http://echo12:8983/solr/update/"; -H "Content-Type: > >> text/xml" --data-binary '' > c:\curl\bin\curl http://echo12:8983/solr/update/ '' > c:\curl\bin\curl "http://echo12:8983/solr/update/"; '' > >> > >> I am using the example configuration in Solr so my documents are found > > > >> in the exampledocs folder also my curl program in located in the root > >> directory which is the reason for the way the curl command is being > >> executed. > >> > >> I would appreciate any information on where to look or how to get the > >> commit command to execute after indexing multiple files. > >> > >> Kevin Miller > >> Oklahoma Tax Commission > >> Web Services > > > > -- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > > using Solr/Lucene: > > http://www.lucidimagination.com/search > > > > > > > > -- > View this message in context: > http://www.nabble.com/An-issue-with-%3Ccommit-%3E-using-Solr-Cell-and-multiple-files-tp25350995p25352122.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Very slow first query
Hi!Why would it take for the first query that I execute almost 60 seconds to run and after that no more than 50ms? I disabled all my caching to check if it is the reason for the subsequent fast responses, but the same happens. I'm using solr 1.3. Something really strange is that it doesn't happen with all the queries. It is happening with a query that filters some integer and string fields joined by an AND operator. Something like A:1 AND B:2 AND (C:3 AND D:"CA") (exact match). My index is around 1200M documents. Thanks, Jonathan
Using EnglishPorterFilterFactory in code
hello i have a task where my user is giving me 20 words of english dictionary and i have to run a program and generate a report with all stemmed words. I have to use EnglishPorterFilterFactory and SnowballPorterFilterFactory to check which one is faster and gets the best results Should i write a java module and use the library which comes with solr. is there any code snipped which i can use Is there any utiltiy which solr provides. If i can get a faint idea of how to do it is to create EnglishPorterFilter from EnglishPorterFilterFactory by passing a tokenizer etc... i will appreciate if some one can give me a hint on this. thanks darniz -- View this message in context: http://www.nabble.com/Using-EnglishPorterFilterFactory-in-code-tp25393325p25393325.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SnowballPorterFilterFactory stemming word question
Thanks Yonik i have a task where my user is giving me 20 words of english dictionary and i have to run a program and generate a report with all stemmed words. I have to use EnglishPorterFilterFactory and SnowballPorterFilterFactory to check which one is faster and gets the best results Should i write a java module and use the library which comes with solr. is there any code snipped which i can use If i can get a faint idea of how to do it is to create EnglishPorterFilter from EnglishPorterFilterFactory by passing a tokenizer etc... i will appreciate if some one can give me a hint on this. thanks darniz Yonik Seeley-2 wrote: > > On Mon, Sep 7, 2009 at 2:49 AM, darniz wrote: >> Does solr provide any implementation for dictionary stemmer, please let >> me >> know > > The Krovetz stemmer is dictionary based (english only): > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem > > But from your original question, maybe you are concerned when the > stemmer doesn't return real words? For normal search, don't be. > During index time, words are stemmed, and then later the query is > stemmed. If the results match up, you're good. For example, a > document containing the word "machines" may stem to "machin" and then > a query of "machined" will stem to "machin" and thus match the > document. > > > -Yonik > http://www.lucidimagination.com > > -- View this message in context: http://www.nabble.com/SnowballPorterFilterFactory-stemming-word-question-tp25180310p25393323.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query runs faster without filter queries?
Thanks! I don't think I can use an unreleased version of solr even is it's stable enough (crazy infrastructure guys) but I might be able to apply the 2 patches mentioned in the link you sent. I will try it in my local copy of solr and see if it improves and let you know. Thanks! On Thu, Sep 10, 2009 at 5:43 PM, Uri Boness wrote: > If I recall correctly, in solr 1.3 there was an issue where filters didn't > really behaved as they should have. Basically, if you had a query and > filters defined, the query would have executed normally and only after that > the filter would be applied. AFAIK this is fixed in 1.4 where now the > documents which are defined by the filters are skipped during the query > execution. > > Uri > > > Jonathan Ariel wrote: > >> Hi all! >> I'm trying to measure the query response time when using just a query and >> when using some filter queries. From what I read and understand adding >> filter query should boost the query response time. I used luke to >> understand >> over which fields I should use filter query (those that have few unique >> terms, in my case 2 fields of 30 and 400 unique fields). I'm using solr >> 1.3. >> In order to test the query performance I disabled queryCache and >> documentCache, so I just have filterCache enabled.I did that because I >> wanted to be sure that there is no caching when I measure my queries. I >> left >> filterCache because it makes sense since filter query uses that. >> >> When I first execute my query without filter cache it runs in 400ms, next >> execution of the same query around 20ms. >> When I first execute my query with filter cache it runs in 500ms, next >> execution of the same query around 50ms. >> >> Why the query with filter query runs slower than the query without filter >> query? Shouldn't it be the other way around? >> >> My index is around 12M documents. My filterCache max size is set to 4 >> (I >> think more than enough). The fields that I use as filter queries are >> integer >> and in my query I search over a tokenized text field. >> >> What do you think? >> >> Thanks a lot, >> >> Jonathan >> >> >> >
Re: Single Core or Multiple Core?
Yes, it seems like I don't need to split. I could use different commit times. In my use case it is too often and I could have a different commit time on a country basis.Your questions made me rethink the need of splitting into cores. Thanks On Fri, Sep 4, 2009 at 5:38 AM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Fri, Sep 4, 2009 at 4:35 AM, Jonathan Ariel wrote: > > > It seems like it is really hard to decide when the Multiple Core solution > > is > > more appropriate.As I could understand from this list and wiki the > Multiple > > Core feature was designed to address the need of handling different sets > of > > data within the same solr instance, where the sets of data don't need to > be > > joined. > > > > Correct. It is also useful when you don't want to setup multiple boxes or > tomcats for each Solr. > > > > In my case the documents are of a specific site and country. So document > A > > can be of Site 1 / Country 1, B of Site 2 / Country 1, C of Site 1 / > > Country > > 2, and so on. > > For the use cases of my application I will never query across countries > or > > sites. I will always have to provide to the query the country id and the > > site id. > > Would you suggest to split my data into cores? I have few sites (around > 20) > > and more countries (around 90). > > Should I split my data into sites (around 20 cores) and within a core > > filter > > by site? Should I split by Site and Country (around 1800 cores)? > > What should I consider when splitting my data into multiple cores? > > > > > The first question is why do you want to split at all? Is the schema or > solrconfig different? Are the different sites or countries updated at > different times? Is the combined index very big that the response times > jump > wildly when all the caches are thrown out if documents related to one site > or country are updated? Does warmup or optimize or replication take too > much > time with one big index? > > Each core will have its own configuration files (maintenance) and you need > to setup replication separately for each core (which is a pain with the > script based replication). Also note that by keeping all cores in one > tomcat > (one JVM), a stop-the-world GC will stop all cores which is not the case > when using separate JVMs for each index/core. > > -- > Regards, > Shalin Shekhar Mangar. >
Re: Highlighting in SolrJ?
If I set snippets to 9 and "mergeContinuous" to true, will I get the entire contents of the field with all the search terms replaced? I don't see what good it would be just getting one line out of the whole field as a snippet. On Thu, Sep 10, 2009 at 7:45 PM, Jay Hill wrote: > Set up the query like this to highlight a field named "content": > > SolrQuery query = new SolrQuery(); > query.setQuery("foo"); > > query.setHighlight(true).setHighlightSnippets(1); //set other params as > needed > query.setParam("hl.fl", "content"); > > QueryResponse queryResponse =getSolrServer().query(query); > > Then to get back the highlight results you need something like this: > > Iterator iter = queryResponse.getResults(); > > while (iter.hasNext()) { > SolrDocument resultDoc = iter.next(); > > String content = (String) resultDoc.getFieldValue("content")); > String id = (String) resultDoc.getFieldValue("id"); //id is the > uniqueKey field > > if (queryResponse.getHighlighting().get(id) != null) { > List highightSnippets = > queryResponse.getHighlighting().get(id).get("content"); > } > } > > Hope that gets you what you need. > > -Jay > http://www.lucidimagination.com > > On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin wrote: > >> Can somebody point me to some sample code for using highlighting in >> SolrJ? I understand the highlighted versions of the field comes in a >> separate NamedList? How does that work? >> >> -- >> http://www.linkedin.com/in/paultomblin >> > -- http://www.linkedin.com/in/paultomblin
Re: Highlighting in SolrJ?
Set up the query like this to highlight a field named "content": SolrQuery query = new SolrQuery(); query.setQuery("foo"); query.setHighlight(true).setHighlightSnippets(1); //set other params as needed query.setParam("hl.fl", "content"); QueryResponse queryResponse =getSolrServer().query(query); Then to get back the highlight results you need something like this: Iterator iter = queryResponse.getResults(); while (iter.hasNext()) { SolrDocument resultDoc = iter.next(); String content = (String) resultDoc.getFieldValue("content")); String id = (String) resultDoc.getFieldValue("id"); //id is the uniqueKey field if (queryResponse.getHighlighting().get(id) != null) { List highightSnippets = queryResponse.getHighlighting().get(id).get("content"); } } Hope that gets you what you need. -Jay http://www.lucidimagination.com On Thu, Sep 10, 2009 at 3:19 PM, Paul Tomblin wrote: > Can somebody point me to some sample code for using highlighting in > SolrJ? I understand the highlighted versions of the field comes in a > separate NamedList? How does that work? > > -- > http://www.linkedin.com/in/paultomblin >
Re: about SOLR-1395 integration with katta
Hi Zhong, For #2 the existing patch SOLR-1395 is a good start. It should be fairly simple to deploy indexes and distribute them to Solr Katta nodes/servers. -J On Wed, Sep 9, 2009 at 11:41 PM, Zhenyu Zhong wrote: > Jason, > > Thanks for the reply. > > In general, I would like to use katta to handle the management overhead such > as single point of failure as well as the distributed index deployment. In > the same time, I still want to use nice search features provided by solr. > > Basically, I would like to try both on the indexing part > 1. Using Hadoop to lauch MR jobs to build index. Then deploy the index to > katta > 2. Using the new patch SOLR-1935 > Based on my understandings, it seems to support index building with > Hadoop. I assume the index would have all the necessary information such as > solr index schema so that I can still use the nice search features provided > by solr. > > On the search part, > I would like to try the distributed search on solr-index which is deployed > on katta if that is possible. > > I would be very appreciated if you could share some thoughts with me. > > thanks > zhong > > > > On Wed, Sep 9, 2009 at 6:06 PM, Jason Rutherglen > wrote: > >> Hi Zhong, >> >> It's a very new patch. I'll update the issue as we start the >> wiki page. >> >> I've been working on indexing in Hadoop in conjunction with >> Katta, which is different (it sounds) than your use case where >> you have prebuilt indexes you simply want to distributed using >> Katta? >> >> -J >> >> On Wed, Sep 9, 2009 at 12:33 PM, Zhenyu Zhong >> wrote: >> > Hi, >> > >> > It is really exciting to see this integration coming out. >> > May I ask how I need to make changes to be able to deploy Solr index on >> > katta servers? >> > Are there any tutorials? >> > >> > thanks >> > zhong >> > >> >
Highlighting in SolrJ?
Can somebody point me to some sample code for using highlighting in SolrJ? I understand the highlighted versions of the field comes in a separate NamedList? How does that work? -- http://www.linkedin.com/in/paultomblin
shards and facet_count
Hi again, I've mostly gotten the multicore working except for one detail. (I'm using solr 1.3 and solr-ruby 0.0.6 in a rails project.) I've done a few queries and I appear to be able to get hits from either core. (yeah!) I'm forming my request like this: req = Solr::Request::Standard.new( :start => start, :rows => max, :sort => sort_param, :query => query, :filter_queries => filter_queries, :field_list => @field_list, :facets => {:fields => @facet_fields, :mincount => 1, :missing => true, :limit => -1}, :highlighting => {:field_list => ['text'], :fragment_size => 600}, :shards => @cores) If I leave ":shards => @cores" out, then the response includes: 'facet_counts' => { 'facet_dates' => {}, 'facet_queries' => {}, 'facet_fields' => { 'myfacet' => [ etc...], etc... } which is what I expect. If I add the ":shards => @cores" back in (so that I'm doing the exact request above), I get: 'facet_counts' => { 'facet_dates' => {}, 'facet_queries' => {}, 'facet_fields' => {} so I've lost my facet information. Why would it correctly find my documents, but not report the facet info? Thanks, Paul
Re: Query runs faster without filter queries?
If I recall correctly, in solr 1.3 there was an issue where filters didn't really behaved as they should have. Basically, if you had a query and filters defined, the query would have executed normally and only after that the filter would be applied. AFAIK this is fixed in 1.4 where now the documents which are defined by the filters are skipped during the query execution. Uri Jonathan Ariel wrote: Hi all! I'm trying to measure the query response time when using just a query and when using some filter queries. From what I read and understand adding filter query should boost the query response time. I used luke to understand over which fields I should use filter query (those that have few unique terms, in my case 2 fields of 30 and 400 unique fields). I'm using solr 1.3. In order to test the query performance I disabled queryCache and documentCache, so I just have filterCache enabled.I did that because I wanted to be sure that there is no caching when I measure my queries. I left filterCache because it makes sense since filter query uses that. When I first execute my query without filter cache it runs in 400ms, next execution of the same query around 20ms. When I first execute my query with filter cache it runs in 500ms, next execution of the same query around 50ms. Why the query with filter query runs slower than the query without filter query? Shouldn't it be the other way around? My index is around 12M documents. My filterCache max size is set to 4 (I think more than enough). The fields that I use as filter queries are integer and in my query I search over a tokenized text field. What do you think? Thanks a lot, Jonathan
Re: Query runs faster without filter queries?
Try 1.4 http://www.lucidimagination.com/blog/2009/05/27/filtered-query-performance-increases-for-solr-14/ -Yonik http://www.lucidimagination.com On Thu, Sep 10, 2009 at 4:35 PM, Jonathan Ariel wrote: > Hi all! > I'm trying to measure the query response time when using just a query and > when using some filter queries. From what I read and understand adding > filter query should boost the query response time. I used luke to understand > over which fields I should use filter query (those that have few unique > terms, in my case 2 fields of 30 and 400 unique fields). I'm using solr 1.3. > In order to test the query performance I disabled queryCache and > documentCache, so I just have filterCache enabled.I did that because I > wanted to be sure that there is no caching when I measure my queries. I left > filterCache because it makes sense since filter query uses that. > > When I first execute my query without filter cache it runs in 400ms, next > execution of the same query around 20ms. > When I first execute my query with filter cache it runs in 500ms, next > execution of the same query around 50ms. > > Why the query with filter query runs slower than the query without filter > query? Shouldn't it be the other way around? > > My index is around 12M documents. My filterCache max size is set to 4 (I > think more than enough). The fields that I use as filter queries are integer > and in my query I search over a tokenized text field. > > What do you think? > > Thanks a lot, > > Jonathan >
Query runs faster without filter queries?
Hi all! I'm trying to measure the query response time when using just a query and when using some filter queries. From what I read and understand adding filter query should boost the query response time. I used luke to understand over which fields I should use filter query (those that have few unique terms, in my case 2 fields of 30 and 400 unique fields). I'm using solr 1.3. In order to test the query performance I disabled queryCache and documentCache, so I just have filterCache enabled.I did that because I wanted to be sure that there is no caching when I measure my queries. I left filterCache because it makes sense since filter query uses that. When I first execute my query without filter cache it runs in 400ms, next execution of the same query around 20ms. When I first execute my query with filter cache it runs in 500ms, next execution of the same query around 50ms. Why the query with filter query runs slower than the query without filter query? Shouldn't it be the other way around? My index is around 12M documents. My filterCache max size is set to 4 (I think more than enough). The fields that I use as filter queries are integer and in my query I search over a tokenized text field. What do you think? Thanks a lot, Jonathan
Re: Solr http post performance seems slow - help?
On Thursday 10 September 2009 01:47:38 pm Walter Underwood wrote: > What kind of storage is used for the Solr index files? When I tested it, NFS > was 100X slower than local disk. I'm sorry - I misunderstood your question. The Solr indexes themselves are stored on local disk. The documents are retrievable (for DIH) from NFS. And, I started looking closer into this problem... both the box doing the posts, and the solr box are around 90% idle while the indexing process is running. And there is no I/O wait time. I'm now looking into possible network slowness... -Dan > > wunder > > -Original Message- > From: Dan A. Dickey [mailto:dan.dic...@savvis.net] > Sent: Thursday, September 10, 2009 11:15 AM > To: solr-user@lucene.apache.org > Cc: Walter Underwood > Subject: Re: Solr http post performance seems slow - help? > > On Thursday 10 September 2009 09:10:27 am Walter Underwood wrote: > > How big are your documents? > > For the most part, I'm just indexing metadata that has been pulled from > the documents. I think I have currently about 40 or so fields that I'm > setting. > When the document is an actual document - pdf, doc, etc... I use the DIH > to extract stuff and also set the metadata then. > > > Is your index on local disk or network- > > mounted disk? > > I'm basically pulling the metadata info from a database and the documents > themselves are shared via NFS to the Solr indexer. > -Dan > > > > > wunder > > > > On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote: > > > > > On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey > > > wrote: > > >> I'm posting documents to Solr using http (curl) from > > >> C++/C code and am seeing approximately 3.3 - 3.4 > > >> documents per second being posted. Is this to be expected? > > > > > > No, that's very slow. > > > Are you using libcurl, or actually forking a new process for every > > > document? > > > Are you committing on every document? > > > > > > If you can, using Java would make your life much easier since you > > > could use the SolrJ client and it's binary protocol for indexing. > > > > > > -Yonik > > > http://www.lucidimagination.com > > > > > > > > > -- Dan A. Dickey | Senior Software Engineer Savvis 10900 Hampshire Ave. S., Bloomington, MN 55438 Office: 952.852.4803 | Fax: 952.852.4951 E-mail: dan.dic...@savvis.net
RE: Solr http post performance seems slow - help?
What kind of storage is used for the Solr index files? When I tested it, NFS was 100X slower than local disk. wunder -Original Message- From: Dan A. Dickey [mailto:dan.dic...@savvis.net] Sent: Thursday, September 10, 2009 11:15 AM To: solr-user@lucene.apache.org Cc: Walter Underwood Subject: Re: Solr http post performance seems slow - help? On Thursday 10 September 2009 09:10:27 am Walter Underwood wrote: > How big are your documents? For the most part, I'm just indexing metadata that has been pulled from the documents. I think I have currently about 40 or so fields that I'm setting. When the document is an actual document - pdf, doc, etc... I use the DIH to extract stuff and also set the metadata then. > Is your index on local disk or network- > mounted disk? I'm basically pulling the metadata info from a database and the documents themselves are shared via NFS to the Solr indexer. -Dan > > wunder > > On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote: > > > On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey > > wrote: > >> I'm posting documents to Solr using http (curl) from > >> C++/C code and am seeing approximately 3.3 - 3.4 > >> documents per second being posted. Is this to be expected? > > > > No, that's very slow. > > Are you using libcurl, or actually forking a new process for every > > document? > > Are you committing on every document? > > > > If you can, using Java would make your life much easier since you > > could use the SolrJ client and it's binary protocol for indexing. > > > > -Yonik > > http://www.lucidimagination.com > > > > -- Dan A. Dickey | Senior Software Engineer Savvis 10900 Hampshire Ave. S., Bloomington, MN 55438 Office: 952.852.4803 | Fax: 952.852.4951 E-mail: dan.dic...@savvis.net
Re: Default Query Type For Facet Queries
If using {!type=customparser} is the only way now, should I file an issue to make the default configurable? -- Stephen Duncan Jr www.stephenduncanjr.com On Thu, Sep 3, 2009 at 11:23 AM, Stephen Duncan Jr wrote: > We have a custom query parser plugin registered as the default for > searches, and we'd like to have the same parser used for facet.query. > > Is there a way to register it as the default for FacetComponent in > solrconfig.xml? > > I know I can add {!type=customparser} to each query as a workaround, but > I'd rather register it in the config that make my code send that and strip > it off on every facet query. > > -- > Stephen Duncan Jr > www.stephenduncanjr.com >
Re: Solr http post performance seems slow - help?
On Thursday 10 September 2009 09:10:27 am Walter Underwood wrote: > How big are your documents? For the most part, I'm just indexing metadata that has been pulled from the documents. I think I have currently about 40 or so fields that I'm setting. When the document is an actual document - pdf, doc, etc... I use the DIH to extract stuff and also set the metadata then. > Is your index on local disk or network- > mounted disk? I'm basically pulling the metadata info from a database and the documents themselves are shared via NFS to the Solr indexer. -Dan > > wunder > > On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote: > > > On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey > > wrote: > >> I'm posting documents to Solr using http (curl) from > >> C++/C code and am seeing approximately 3.3 - 3.4 > >> documents per second being posted. Is this to be expected? > > > > No, that's very slow. > > Are you using libcurl, or actually forking a new process for every > > document? > > Are you committing on every document? > > > > If you can, using Java would make your life much easier since you > > could use the SolrJ client and it's binary protocol for indexing. > > > > -Yonik > > http://www.lucidimagination.com > > > > -- Dan A. Dickey | Senior Software Engineer Savvis 10900 Hampshire Ave. S., Bloomington, MN 55438 Office: 952.852.4803 | Fax: 952.852.4951 E-mail: dan.dic...@savvis.net
Re: Solr http post performance seems slow - help?
On Thursday 10 September 2009 08:39:38 am Yonik Seeley wrote: > On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey wrote: > > I'm posting documents to Solr using http (curl) from > > C++/C code and am seeing approximately 3.3 - 3.4 > > documents per second being posted. Is this to be expected? > > No, that's very slow. > Are you using libcurl, or actually forking a new process for every document? I'm using libcurl and not forking. > Are you committing on every document? No. > If you can, using Java would make your life much easier since you > could use the SolrJ client and it's binary protocol for indexing. As much as I'd like to, I can't. At this point in time it would take far too much code restructuring and rewriting. There is a database involved, and some senseless portability library being used - though we only run on Linux at this point in time. It's just too much work to switch over to using Java, for now. -Dan -- Dan A. Dickey | Senior Software Engineer Savvis 10900 Hampshire Ave. S., Bloomington, MN 55438 Office: 952.852.4803 | Fax: 952.852.4951 E-mail: dan.dic...@savvis.net
Re: query parser question
On Thu, Sep 10, 2009 at 1:28 PM, Joe Calderon wrote: > i have field called text_stem that has a kstemmer on it, im having > trouble matching wildcard searches on a word that got stemmed > > for example i index the word "america's", which according to > analysis.jsp after stemming gets indexed as "america" > > when matching i do a query like myfield:(ame*) which matches the > indexed term, this all works fine until the query becomes > myfield:(america's*) at which point it doesnt match, however if i > remove the wildcard like myfield:(america's) the it works again > > its almost like the term doesnt get stemmed when using a wildcard Correct - it's not stemmed. If it were stemmed, there would be multiple cases where that wouldn't work either. For example, with the porter stemmer, "any"->"ani" and "anywhere"->"anywher" So if you had a document with "anywhere", a prefix query of "any*" wouldn't work if you stemmed it, and would match other things like "animal". -Yonik http://www.lucidimagination.com
RE: OutOfMemory error on solr 1.3
SO, do you think increasing the JVM will help? We also have 500 in solrconfig.xml Originally was set to 200 Currently we give solr 1.5GB for Xms and Xmx, we use jrockit version 1.5.0_15 4 S root 12543 12495 16 76 0 - 848974 184466 Jul20 ? 8-11:12:03 /opt/bea/jrmc-3.0.3-1.5.0/bin/java -Xms1536m -Xmx1536m -Xns:128m -Xgc:gencon -Djavelin.jsp.el.elcache=4096 -Dsolr.solr.home=/opt/apache-solr-1.3.0/example/solr Francis -Original Message- From: Constantijn Visinescu [mailto:baeli...@gmail.com] Sent: Wednesday, September 09, 2009 11:35 PM To: solr-user@lucene.apache.org Subject: Re: OutOfMemory error on solr 1.3 Just wondering, how much memory are you giving your JVM ? On Thu, Sep 10, 2009 at 7:46 AM, Francis Yakin wrote: > > I am having OutOfMemory error on our slaves server, I would like to know if > someone has the same issue and have the solution for this. > > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@96cd2ffc:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: > 441216, Num elements: 55150 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@519116e0:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@74dc52fa:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@d0dd3e28:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@b6dfa5bc:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@482b13ef:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@2309438c:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@277bd48c:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14128832, Num elements: 3532204 > Exception in thread "[ACTIVE] ExecuteThread: '7' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > Exception in thread "[ACTIVE] ExecuteThread: '8' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > Exception in thread "[ACTIVE] ExecuteThread: '10' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > Exception in thread "[ACTIVE] ExecuteThread: '11' for queue: > 'weblogic.kernel.Default (self-tuning)'" java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@41405463:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 751552, Num elements: 187884 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, > Num elements: 8192 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, > Num elements: 8192 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5096, > Num elements: 2539 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5400, > Num elements: 2690 > > deployment service message for request id "-1" from server "AdminServer". > Exception is: "java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object > size: 4368, Num elements: 2174 > SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: > 14140768, Num elements: 3535188 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@8dbcc7ab:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 5395576, Num elements: 1348890 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 5320, > Num elements: 2649 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@4d0c6fc5:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 751560, Num elements: 187885 > java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 16400, > Num elements: 8192 > SEVERE: Error during auto-warming of > key:org.apache.solr.search.queryresult...@fb6bac19:java.lang.OutOfMemoryError: > allocLargeObjectOrArray - Object size: 14140904, Num elements: 3535222 > SEV
query parser question
i have field called text_stem that has a kstemmer on it, im having trouble matching wildcard searches on a word that got stemmed for example i index the word "america's", which according to analysis.jsp after stemming gets indexed as "america" when matching i do a query like myfield:(ame*) which matches the indexed term, this all works fine until the query becomes myfield:(america's*) at which point it doesnt match, however if i remove the wildcard like myfield:(america's) the it works again its almost like the term doesnt get stemmed when using a wildcard im using 1.4 nightly, is this the correct behaviour, is there something i should do differently? in the mean time ive added "americas" as protected word in the kstemmer but im afraid of more edge cases that will come up --joe
Re: TermsComponent
Thanks for the pointer. Definitely appreciate the help. Todd On Thu, Sep 10, 2009 at 11:10 AM, Jay Hill wrote: > If you need an alternative to using the TermsComponent for auto-suggest, > have a look at this blog on using EdgeNGrams instead of the TermsComponent. > > > http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ > > -Jay > http://www.lucidimagination.com > > > On Wed, Sep 9, 2009 at 3:35 PM, Todd Benge wrote: > > > We're using the StandardAnalyzer but I'm fairly certain that's not the > > issue. > > > > In fact, I there doesn't appear to be any issue with Lucene or Solr. > There > > are many instances of data in which users have removed the whitespace so > > they have a high frequency which means they bubble to the top of the > sort. > > The result is that a search for a name shows a first and last name > without > > the whitespace. > > > > One thing I've noticed is that since TermsComponent is working on a > single > > Term, there doesn't seem to be a way to query against a phrase. The same > > example as above applies, so if you're querying for name it'd be prefered > > to > > get multi-term responses back if a first name matches. > > > > Any suggestions? > > > > Thanks for all the help. It's much appreciated. > > > > Todd > > > > > > On Wed, Sep 9, 2009 at 12:11 PM, Grant Ingersoll > >wrote: > > > > > And what Analyzer are you using? I'm guessing that your words are > being > > > split up during analysis, which is why you aren't seeing whitespace. > If > > you > > > want to keep the whitespace, you will need to use the String field type > > or > > > possibly the Keyword Analyzer. > > > > > > -Grant > > > > > > > > > On Sep 9, 2009, at 11:06 AM, Todd Benge wrote: > > > > > > It's set as Field.Store.YES, Field.Index.ANALYZED. > > >> > > >> > > >> > > >> On Wed, Sep 9, 2009 at 8:15 AM, Grant Ingersoll > > >> wrote: > > >> > > >> How are you tokenizing/analyzing the field you are accessing? > > >>> > > >>> > > >>> On Sep 9, 2009, at 8:49 AM, Todd Benge wrote: > > >>> > > >>> Hi Rekha, > > >>> > > > > Here's teh link to the TermsComponent info: > > > > http://wiki.apache.org/solr/TermsComponent > > > > and another link Matt Weber did on autocompletion: > > > > > > > > > > > http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ > > > > We had to upgrade to the latest nightly to get the TermsComponent to > > work. > > > > Good Luck! > > > > Todd > > > > On Wed, Sep 9, 2009 at 5:17 AM, dharhsana < > rekha.dharsh...@gmail.com> > > wrote: > > > > > > Hi, > > > > > > I have a requirement on Autocompletion search , iam using solr 1.4. > > > > > > Could you please tell me how you worked on that Terms component > using > > > solr > > > 1.4, > > > i could'nt find terms component in solr 1.4 which i have > > downloaded,is > > > there > > > anyother configuration should be done. > > > > > > Do you have code for autocompletion, please share wih me.. > > > > > > Regards > > > Rekha > > > > > > > > > > > > tbenge wrote: > > > > > > > > >> Hi, > > >> > > >> I was looking at TermsComponent in Solr 1.4 as a way of building a > > >> autocomplete function. I have a prototype working but noticed > that > > >> terms > > >> that have whitespace in them when indexed are absent the > whitespace > > >> when > > >> returned from the TermsComponent. > > >> > > >> Any ideas on why that may be happening? Am I just missing a > > >> > > >> configuration > > > > > > option? > > >> > > >> Thanks, > > >> > > >> Todd > > >> > > >> > > >> > > >> -- > > > View this message in context: > > > http://www.nabble.com/TermsComponent-tp25302503p25362829.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > > > > > -- > > >>> Grant Ingersoll > > >>> http://www.lucidimagination.com/ > > >>> > > >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > > using > > >>> Solr/Lucene: > > >>> http://www.lucidimagination.com/search > > >>> > > >>> > > >>> > > > -- > > > Grant Ingersoll > > > http://www.lucidimagination.com/ > > > > > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using > > > Solr/Lucene: > > > http://www.lucidimagination.com/search > > > > > > > > >
Re: TermsComponent
If you need an alternative to using the TermsComponent for auto-suggest, have a look at this blog on using EdgeNGrams instead of the TermsComponent. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ -Jay http://www.lucidimagination.com On Wed, Sep 9, 2009 at 3:35 PM, Todd Benge wrote: > We're using the StandardAnalyzer but I'm fairly certain that's not the > issue. > > In fact, I there doesn't appear to be any issue with Lucene or Solr. There > are many instances of data in which users have removed the whitespace so > they have a high frequency which means they bubble to the top of the sort. > The result is that a search for a name shows a first and last name without > the whitespace. > > One thing I've noticed is that since TermsComponent is working on a single > Term, there doesn't seem to be a way to query against a phrase. The same > example as above applies, so if you're querying for name it'd be prefered > to > get multi-term responses back if a first name matches. > > Any suggestions? > > Thanks for all the help. It's much appreciated. > > Todd > > > On Wed, Sep 9, 2009 at 12:11 PM, Grant Ingersoll >wrote: > > > And what Analyzer are you using? I'm guessing that your words are being > > split up during analysis, which is why you aren't seeing whitespace. If > you > > want to keep the whitespace, you will need to use the String field type > or > > possibly the Keyword Analyzer. > > > > -Grant > > > > > > On Sep 9, 2009, at 11:06 AM, Todd Benge wrote: > > > > It's set as Field.Store.YES, Field.Index.ANALYZED. > >> > >> > >> > >> On Wed, Sep 9, 2009 at 8:15 AM, Grant Ingersoll > >> wrote: > >> > >> How are you tokenizing/analyzing the field you are accessing? > >>> > >>> > >>> On Sep 9, 2009, at 8:49 AM, Todd Benge wrote: > >>> > >>> Hi Rekha, > >>> > > Here's teh link to the TermsComponent info: > > http://wiki.apache.org/solr/TermsComponent > > and another link Matt Weber did on autocompletion: > > > > > http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ > > We had to upgrade to the latest nightly to get the TermsComponent to > work. > > Good Luck! > > Todd > > On Wed, Sep 9, 2009 at 5:17 AM, dharhsana > wrote: > > > Hi, > > > > I have a requirement on Autocompletion search , iam using solr 1.4. > > > > Could you please tell me how you worked on that Terms component using > > solr > > 1.4, > > i could'nt find terms component in solr 1.4 which i have > downloaded,is > > there > > anyother configuration should be done. > > > > Do you have code for autocompletion, please share wih me.. > > > > Regards > > Rekha > > > > > > > > tbenge wrote: > > > > > >> Hi, > >> > >> I was looking at TermsComponent in Solr 1.4 as a way of building a > >> autocomplete function. I have a prototype working but noticed that > >> terms > >> that have whitespace in them when indexed are absent the whitespace > >> when > >> returned from the TermsComponent. > >> > >> Any ideas on why that may be happening? Am I just missing a > >> > >> configuration > > > > option? > >> > >> Thanks, > >> > >> Todd > >> > >> > >> > >> -- > > View this message in context: > > http://www.nabble.com/TermsComponent-tp25302503p25362829.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > > > > -- > >>> Grant Ingersoll > >>> http://www.lucidimagination.com/ > >>> > >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using > >>> Solr/Lucene: > >>> http://www.lucidimagination.com/search > >>> > >>> > >>> > > -- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > > Solr/Lucene: > > http://www.lucidimagination.com/search > > > > >
Re: Pagination with solr json data
All you have to do is use the "start" and "rows" parameters to get the results you want. For example, the query for the first page of results might look like this, ?q=solr&start=0&rows=10 (other params omitted). So you'll start at the beginning (0) and get 10 results. They next page would be ?q=solr&start=10&rows=10 - start at the 10th result and display the next 10 rows. Then ?q=solr&start=20&rows=10, and so on. -Jay http://www.lucidimagination.com On Wed, Sep 9, 2009 at 12:24 PM, Elaine Li wrote: > Hi, > > What is the best way to do pagination? > > I searched around and only found some YUI utilities can do this. But > their examples don't have very close match to the pattern I have in > mind. I would like to have pretty plain display, something like the > search results from google. > > Thanks. > > Elaine >
Re: Passing FuntionQuery string parameters
It looks like parseArg was added on Aug 20, 2009. I'm working with slightly older code. Thanks! Noble Paul നോബിള് नोब्ळ्-2 wrote: > > did you implement your own ValueSourceParser . the > FunctionQParser#parseArg() method supports strings > > On Wed, Sep 9, 2009 at 12:10 AM, wojtekpia wrote: >> >> Hi, >> >> I'm writing a function query to score documents based on Levenshtein >> distance from a string. I want my function calls to look like: >> >> lev(myFieldName, 'my string to match') >> >> I'm running into trouble parsing the string I want to match ('my string >> to >> match' above). It looks like all the built in support is for parsing >> field >> names and numeric values. Am I missing the string parsing support, or is >> it >> not there, and if not, why? >> >> Thanks, >> >> Wojtek >> -- >> View this message in context: >> http://www.nabble.com/Passing-FuntionQuery-string-parameters-tp25351825p25351825.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com > > -- View this message in context: http://www.nabble.com/Passing-FuntionQuery-string-parameters-tp25351825p25386910.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Backups using Replication
I'm using trunk from July 8, 2009. Do you know if it's more recent than that? Noble Paul നോബിള് नोब्ळ्-2 wrote: > > which version of Solr are you using? the "backupAfter" name was > introduced recently > -- View this message in context: http://www.nabble.com/Backups-using-Replication-tp25350083p25386886.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re : Re : Re : Pb using delta import with XPathEntityProcessor
https://issues.apache.org/jira/browse/SOLR-1421 2009/9/10 Noble Paul നോബിള് नोब्ळ् : > I guess there is a bug. I shall raise an issue. > > > > 2009/9/10 Noble Paul നോബിള് नोब्ळ् : >> everything looks fine and it beats me completely. I guess you will >> have to debug this >> >> On Thu, Sep 10, 2009 at 6:17 PM, nourredine khadri >> wrote: >>> Some fields are null but not the one parsed by XPathEntityProcessor (named >>> XML) >>> >>> 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.LogTransformer >>> transformRow >>> FIN: Map content : {KEYWORDS=pub, SPECIFIC=null, FATHERSID=, CONTAINERID=, >>> ARCHIVEDDATE=0, SITE=12308, LANGUAGE=null, ARCHIVESTATE=false, >>> OFFLINEATDATE=0, ONLINEATDATE=1026307864230, STATUS=0, >>> DATESTATUS=1113905585726, MODEL=0, ACTIVATIONSTATE=true, >>> MOUNTED_SITE_IDS=null, SPECIFIC_XML=null, PUBLICATIONSTATE=true, XML=>> version="1.0" encoding="ISO-8859-1"?> >> Template="Article" Ref="10"> Empty Subtitle - Click Here >>> to edit Empty Title - Click Here to >>> edit Empty Chap¶ - Click Here to >>> edit Empty Autor - Click Here to >>> edit Empty Catchword - Click Here to >>> edit Empty InterTitle - Cl >>> ick Here to edit TextEmpty Paragraph - Click Here >>> to edit Text >>> , IDENTIFIERVERSION=5040052, CONTENTID=5040052} >>> 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DocBuilder >>> buildDocument >>> GRAVE: Exception while processing: xml_document document : >>> SolrInputDocument[{keywords=keywords(1.0)={pub}, >>> fathersId=fathersId(1.0)={}, containerId=containerId(1.0)={}, >>> site=site(1.0)={12308}, archiveState=archiveState(1.0)={false}, >>> offlineAtDate=offlineAtDate(1.0)={0}, >>> onlineAtDate=onlineAtDate(1.0)={1026307864230}, status=status(1.0)={0}, >>> dateStatus=dateStatus(1.0)={1113905585726}, model=model(1.0)={0}, >>> activationState=activationState(1.0)={true}, >>> publicationState=publicationState(1.0)={true}, xml=xml(1.0)={>> version="1.0" encoding="ISO-8859-1"?> >> Template="Article" Ref="10"> Empty Subtitle - Click Here >>> to edit Empty Title - Click Here to >>> edit Empty Chap¶ - Click Here to edit< >>> /Parag> Empty Autor - Click Here to edit >>> Empty Catchword - Click Here to edit >>> Empty InterTitle - Click Here to edit >>> TextEmpty Paragraph - Click Here to edit >>> Text >>> }, identifierversion=identifierversion(1.0)={5040052}, >>> contentid=contentid(1.0)={5040052}}] >>> org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing >>> failed for xml, url:null rows processed:0 Processing Document # 1 >>> at >>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) >>> at >>> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292) >>> at >>> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) >>> at >>> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) >>> at >>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259) >>> at >>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) >>> at >>> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354) >>> at >>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395) >>> at >>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) >>> Caused by: java.lang.RuntimeException: java.lang.NullPointerException >>> at >>> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92) >>> at >>> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282) >>> ... 10 more >>> Caused by: java.lang.NullPointerException >>> at >>> com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245) >>> at >>> com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132) >>> at >>> com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543) >>> at >>> com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604) >>> at >>> com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660) >>> at >>> com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331) >>> at >>> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88) >>> ... 11 more >>> 10 sept. 2009 14:40:34 org.apa
Re: Re : Re : Re : Pb using delta import with XPathEntityProcessor
I guess there is a bug. I shall raise an issue. 2009/9/10 Noble Paul നോബിള് नोब्ळ् : > everything looks fine and it beats me completely. I guess you will > have to debug this > > On Thu, Sep 10, 2009 at 6:17 PM, nourredine khadri > wrote: >> Some fields are null but not the one parsed by XPathEntityProcessor (named >> XML) >> >> 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.LogTransformer >> transformRow >> FIN: Map content : {KEYWORDS=pub, SPECIFIC=null, FATHERSID=, CONTAINERID=, >> ARCHIVEDDATE=0, SITE=12308, LANGUAGE=null, ARCHIVESTATE=false, >> OFFLINEATDATE=0, ONLINEATDATE=1026307864230, STATUS=0, >> DATESTATUS=1113905585726, MODEL=0, ACTIVATIONSTATE=true, >> MOUNTED_SITE_IDS=null, SPECIFIC_XML=null, PUBLICATIONSTATE=true, XML=> version="1.0" encoding="ISO-8859-1"?> > Template="Article" Ref="10"> Empty Subtitle - Click Here >> to edit Empty Title - Click Here to >> edit Empty Chap¶ - Click Here to >> edit Empty Autor - Click Here to edit >> Empty Catchword - Click Here to edit >> Empty InterTitle - Cl >> ick Here to edit TextEmpty Paragraph - Click Here >> to edit Text >> , IDENTIFIERVERSION=5040052, CONTENTID=5040052} >> 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DocBuilder >> buildDocument >> GRAVE: Exception while processing: xml_document document : >> SolrInputDocument[{keywords=keywords(1.0)={pub}, >> fathersId=fathersId(1.0)={}, containerId=containerId(1.0)={}, >> site=site(1.0)={12308}, archiveState=archiveState(1.0)={false}, >> offlineAtDate=offlineAtDate(1.0)={0}, >> onlineAtDate=onlineAtDate(1.0)={1026307864230}, status=status(1.0)={0}, >> dateStatus=dateStatus(1.0)={1113905585726}, model=model(1.0)={0}, >> activationState=activationState(1.0)={true}, >> publicationState=publicationState(1.0)={true}, xml=xml(1.0)={> version="1.0" encoding="ISO-8859-1"?> > Template="Article" Ref="10"> Empty Subtitle - Click Here >> to edit Empty Title - Click Here to >> edit Empty Chap¶ - Click Here to edit< >> /Parag> Empty Autor - Click Here to edit >> Empty Catchword - Click Here to edit >> Empty InterTitle - Click Here to edit >> TextEmpty Paragraph - Click Here to edit >> Text >> }, identifierversion=identifierversion(1.0)={5040052}, >> contentid=contentid(1.0)={5040052}}] >> org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing >> failed for xml, url:null rows processed:0 Processing Document # 1 >> at >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) >> at >> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292) >> at >> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) >> at >> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365) >> at >> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259) >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) >> at >> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354) >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395) >> at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) >> Caused by: java.lang.RuntimeException: java.lang.NullPointerException >> at >> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92) >> at >> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282) >> ... 10 more >> Caused by: java.lang.NullPointerException >> at >> com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245) >> at >> com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132) >> at >> com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543) >> at >> com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604) >> at >> com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660) >> at >> com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331) >> at >> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88) >> ... 11 more >> 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DataImporter >> doDeltaImport >> GRAVE: Delta Import Failed >> org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing >> failed for xml, url:null
Re: Field Collapsing (was Re: Schema for group/child entity setup)
The current patch definitely supports facet before and after the collapsing. Stephen Weiss wrote: I just noticed this and it reminded me of an issue I've had with collapsed faceting with an older version of the patch in Solr 1.3. Would it be possible, if we can get the terms for all the collapsed documents on a field, to then facet each collapsed document on the unique terms it has collectively? What I mean is for example: Doc 1, 2, 3 collapse together on some other field Doc 1 is the "main document" and has the "colors" blue and red Doc 2 has red Doc 3 has green For the purposes of faceting, it would be ideal in our case for faceting on color to count one each for blue, red, and green on this document (the user drills down on this value to yet another collapsed set). Right now, when you facet after collapse you just get blue and red (green is dropped because it collapses out). To the user it makes the counts seem inaccurate, like they're missing something. Instead we facet before collapsing and get an "inflated" value (which ticks 2 for red - but when you drill down, you still only get 1 because Doc 1 and Doc 2 collapse together again). Either way it's not ideal. At the time (many months ago) there was no way to account for this but it sounds like this patch could make it possible, maybe. Thanks! -- Steve On Sep 5, 2009, at 5:57 AM, Uri Boness wrote: There's work on the patch that is being done now which will enable you to ask for specific field values of the collapsed documents using a dedicated request parameter. This work is not committed yet to the latest patch, but will be very soon. There is of course a drawback to that as well, the collapsed documents set can be very large (depends on your data of course) in which case the returned result which includes the fields values can be rather large, which will impact performance, this is why this feature will be enabled only if you specify this extra parameter - by default no field values will be returned. AFAIK, the latest patch should work fine with the latest build. Martijn (which is the main maintainer of this patch) tries to keep it up to date with the latest builds. But I guess the safest way is to work with the nightly build of the same date as the latest patch (though I would give it a try first with the latest build). BTW, it's not an official suggestion from the Solr development team, but if you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I would go for the later. 1.4 is supposed to be released in the upcoming week or two and it bring loads of bug fixes, enhancements and extra functionality. But again, this is my personal suggestion. cheers, Uri
Facet fields and the DisMax query handler
I'm trying to understand the DisMax query handler. I orginally configured it to ensure that the query was mapped onto different fields in the documents and a boost assigned if the fields match. And that works pretty smoothly. However when it comes to facetted searches the results perplexes me. Consider the following example; Document A: John Doe Document B: John Doe The following queries does not return anything; Staff:Doe Staff:Doe* Staff:John Staff:John* The query; Staff:"John" Returns Document A and B, even though document B doesnt even contain the field 'Staff' (which is optional)! Through the "qf" field dismax has been configured to search over the field 'ProjectManager' but I expected the usage of a facet value would exclude the field... Looking at the score of the documents, document A does score much higher than Document B (a factor 20) but I would expect not to see B at all. I have changed the dismax configuration minimum match to be 1, to ensure that all hits with a single match is returned without effect. I have changed the tie to 0 with no effect. What am I missing here? I would like queries such as 'Staff:Doe' to return document A, and only A. Cheers, Gert. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: solr 1.3 and multicore data directory
you do not have to make 3 copies of conf dir even in Solr1.3 you can try this ${./solr/${solr.core.name}/data} On Thu, Sep 10, 2009 at 7:55 PM, Paul Rosen wrote: > Ok. I have a workaround for now. I've duplicated the conf folder three times > and changed this line in solrconfig.xml in each folder: > > ${solr.data.dir:./solr/exhibits/data} > > I can't wait for solr 1.4! > > Noble Paul നോബിള് नोब्ळ् wrote: >> >> the dataDir is a Solr1.4 feature >> >> On Thu, Sep 10, 2009 at 1:57 AM, Paul Rosen >> wrote: >>> >>> Hi All, >>> >>> I'm trying to set up solr 1.3 to use multicore but I'm getting some >>> puzzling >>> results. My solr.xml file is: >>> >>> >>> >>> >>> >> dataDir="solr/resources/data/" /> >>> >> dataDir="solr/exhibits/data/" >>> /> >>> >> dataDir="solr/reindex_resources/data/" /> >>> >>> >>> >>> When I start up solr, everything looks normal until I get this line in >>> the >>> log: >>> >>> INFO: [resources] Opening new SolrCore at solr/resources/, >>> dataDir=./solr/data/ >>> >>> And a new folder is created ./solr/data/index with a blank index. And, of >>> course, any queries go to that blank index and not to one of my cores. >>> >>> Actually, what I'd really like is to have my directory structure look >>> like >>> this (some items removed for brevity): >>> >>> - >>> solr_1.3 >>> lib >>> solr >>> solr.xml >>> bin >>> conf >>> data >>> resources >>> index >>> exhibits >>> index >>> reindex_resources >>> index >>> start.jar >>> - >>> >>> And have all the cores share everything except an index. >>> >>> How would I set that up? >>> >>> Are there differences between 1.3 and 1.4 in this respect? >>> >>> Thanks, >>> Paul >>> >> >> >> > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Re : Re : Re : Pb using delta import with XPathEntityProcessor
everything looks fine and it beats me completely. I guess you will have to debug this On Thu, Sep 10, 2009 at 6:17 PM, nourredine khadri wrote: > Some fields are null but not the one parsed by XPathEntityProcessor (named > XML) > > 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.LogTransformer > transformRow > FIN: Map content : {KEYWORDS=pub, SPECIFIC=null, FATHERSID=, CONTAINERID=, > ARCHIVEDDATE=0, SITE=12308, LANGUAGE=null, ARCHIVESTATE=false, > OFFLINEATDATE=0, ONLINEATDATE=1026307864230, STATUS=0, > DATESTATUS=1113905585726, MODEL=0, ACTIVATIONSTATE=true, > MOUNTED_SITE_IDS=null, SPECIFIC_XML=null, PUBLICATIONSTATE=true, XML= version="1.0" encoding="ISO-8859-1"?> Ref="10"> Empty Subtitle - Click Here to > edit Empty Title - Click Here to > edit Empty Chap¶ - Click Here to > edit Empty Autor - Click Here to edit > Empty Catchword - Click Here to edit > Empty InterTitle - Cl > ick Here to edit TextEmpty Paragraph - Click Here > to edit Text > , IDENTIFIERVERSION=5040052, CONTENTID=5040052} > 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DocBuilder > buildDocument > GRAVE: Exception while processing: xml_document document : > SolrInputDocument[{keywords=keywords(1.0)={pub}, fathersId=fathersId(1.0)={}, > containerId=containerId(1.0)={}, site=site(1.0)={12308}, > archiveState=archiveState(1.0)={false}, offlineAtDate=offlineAtDate(1.0)={0}, > onlineAtDate=onlineAtDate(1.0)={1026307864230}, status=status(1.0)={0}, > dateStatus=dateStatus(1.0)={1113905585726}, model=model(1.0)={0}, > activationState=activationState(1.0)={true}, > publicationState=publicationState(1.0)={true}, xml=xml(1.0)={ version="1.0" encoding="ISO-8859-1"?> Ref="10"> Empty Subtitle - Click Here to > edit Empty Title - Click Here to > edit Empty Chap¶ - Click Here to edit< > /Parag> Empty Autor - Click Here to edit > Empty Catchword - Click Here to edit > Empty InterTitle - Click Here to edit > TextEmpty Paragraph - Click Here to edit > Text > }, identifierversion=identifierversion(1.0)={5040052}, > contentid=contentid(1.0)={5040052}}] > org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed > for xml, url:null rows processed:0 Processing Document # 1 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365) > at > org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) > at > org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) > Caused by: java.lang.RuntimeException: java.lang.NullPointerException > at > org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282) > ... 10 more > Caused by: java.lang.NullPointerException > at > com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245) > at > com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132) > at > com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543) > at > com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604) > at > com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660) > at > com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331) > at > org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88) > ... 11 more > 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DataImporter > doDeltaImport > GRAVE: Delta Import Failed > org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed > for xml, url:null rows processed:0 Processing Document # 1 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) > at > org.apache.solr.handler.dataimport.XPathEntityProces
Re: Field Collapsing (was Re: Schema for group/child entity setup)
All work and progress on this patch is done under the JIRA issue: https://issues.apache.org/jira/browse/SOLR-236 R. Tan wrote: The patch which will be committed soon will add this functionality. Where can I follow the progress of this patch? On Mon, Sep 7, 2009 at 3:38 PM, Uri Boness wrote: Great. Nice site and very similar to my requirements. thanks. So, right now, you get all field values by default? Right now, no field values are returned for the collapsed documents. The patch which will be committed soon will add this functionality. R. Tan wrote: Great. Nice site and very similar to my requirements. There's work on the patch that is being done now which will enable you to ask for specific field values of the collapsed documents using a dedicated request parameter. So, right now, you get all field values by default? On Sun, Sep 6, 2009 at 3:58 AM, Uri Boness wrote: You can check out http://www.ilocal.nl. If you search for a bank in Amsterdam then you'll see that a lot of the results are collapsed. For this we used an older version of this patch (which works on 1.3) but a lot has changed since then. We're currently using this patch on another project, but it's not live yet. Uri R. Tan wrote: Thanks Uri. Your personal suggestion is appreciated and I think I'll follow your advice. We're still early in development and 1.4 would be a good choice. I hope I can get field collapsing to work with my requirements. Do you know any live site using field collapsing already? On Sat, Sep 5, 2009 at 5:57 PM, Uri Boness wrote: There's work on the patch that is being done now which will enable you to ask for specific field values of the collapsed documents using a dedicated request parameter. This work is not committed yet to the latest patch, but will be very soon. There is of course a drawback to that as well, the collapsed documents set can be very large (depends on your data of course) in which case the returned result which includes the fields values can be rather large, which will impact performance, this is why this feature will be enabled only if you specify this extra parameter - by default no field values will be returned. AFAIK, the latest patch should work fine with the latest build. Martijn (which is the main maintainer of this patch) tries to keep it up to date with the latest builds. But I guess the safest way is to work with the nightly build of the same date as the latest patch (though I would give it a try first with the latest build). BTW, it's not an official suggestion from the Solr development team, but if you ask me, if you have to choose now whether to use 1.3 or 1.4-dev, I would go for the later. 1.4 is supposed to be released in the upcoming week or two and it bring loads of bug fixes, enhancements and extra functionality. But again, this is my personal suggestion. cheers, Uri R. Tan wrote: Okay. Thanks for giving an insight on how it works in general. Without trying it myself, are the field values for the collapsed ones also part of the results data? What is the latest build that is safe to use on a production environment? I'd probably go for that and use field collapsing. Thank you very much. On Fri, Sep 4, 2009 at 4:49 AM, Uri Boness wrote: The collapsed documents are represented by one "master" document which can be part of the normal search result (the doc list), so pagination just works as expected, meaning taking only the returned documents in account (ignoring the collapsed ones). As for the scoring, the "master" document is actually the document with the highest score in the collapsed group. As for Solr 1.3 compatibility... well... it's very hart to tell. All latest patch are certainly *not* 1.3 compatible (I think they're also depending on some changes in lucene which are not available for solr 1.3). I guess you'll have to try some of the old patches, but I'm not sure about their stability. cheers, Uri R. Tan wrote: Thanks Uri. How does paging and scoring work when using field collapsing? What patch works with 1.3? Is it production ready? R On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness wrote: The development on this patch is quite active. It works well for single solr instance, but distributed search (ie. shards) is not yet supported. Using this page you can group search results based on a specific field. There are two flavors of field collapsing - adjacent and non-adjacent, the former collapses only document which happen to be located next to each other in the otherwise-non-collapsed results set. The later (the non-adjacent) one collapses all documents with the same field value (regardless of their position in the otherwise-non-collapsed results set). Note, that non-adjacent performs better than adjacent one. There's currently discussion to extend this support
Re: solr 1.3 and multicore data directory
Ok. I have a workaround for now. I've duplicated the conf folder three times and changed this line in solrconfig.xml in each folder: ${solr.data.dir:./solr/exhibits/data} I can't wait for solr 1.4! Noble Paul നോബിള് नोब्ळ् wrote: the dataDir is a Solr1.4 feature On Thu, Sep 10, 2009 at 1:57 AM, Paul Rosen wrote: Hi All, I'm trying to set up solr 1.3 to use multicore but I'm getting some puzzling results. My solr.xml file is: When I start up solr, everything looks normal until I get this line in the log: INFO: [resources] Opening new SolrCore at solr/resources/, dataDir=./solr/data/ And a new folder is created ./solr/data/index with a blank index. And, of course, any queries go to that blank index and not to one of my cores. Actually, what I'd really like is to have my directory structure look like this (some items removed for brevity): - solr_1.3 lib solr solr.xml bin conf data resources index exhibits index reindex_resources index start.jar - And have all the cores share everything except an index. How would I set that up? Are there differences between 1.3 and 1.4 in this respect? Thanks, Paul
Re: Solr http post performance seems slow - help?
How big are your documents? Is your index on local disk or network- mounted disk? wunder On Sep 10, 2009, at 6:39 AM, Yonik Seeley wrote: On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey wrote: I'm posting documents to Solr using http (curl) from C++/C code and am seeing approximately 3.3 - 3.4 documents per second being posted. Is this to be expected? No, that's very slow. Are you using libcurl, or actually forking a new process for every document? Are you committing on every document? If you can, using Java would make your life much easier since you could use the SolrJ client and it's binary protocol for indexing. -Yonik http://www.lucidimagination.com
Re: Solr http post performance seems slow - help?
On Thu, Sep 10, 2009 at 9:13 AM, Dan A. Dickey wrote: > I'm posting documents to Solr using http (curl) from > C++/C code and am seeing approximately 3.3 - 3.4 > documents per second being posted. Is this to be expected? No, that's very slow. Are you using libcurl, or actually forking a new process for every document? Are you committing on every document? If you can, using Java would make your life much easier since you could use the SolrJ client and it's binary protocol for indexing. -Yonik http://www.lucidimagination.com
RE: Extract info from parent node during data import
>Hi Paul, >The forEach="/document/category/item | /document/category/name" didn't work >(no categoryname was stored or indexed). >However forEach="/document/category/item | /document/category" seems to work >well. I am not sure why category on its own works, but not category/name... >But thanks for tip. It wasn't as painful as I thought it would be. >Venn Hmmm, I had bother with this. Although each occurance of /document/category/item causes a new solr document to indexed, that document contained all the fields from the parent element as well. Did you see this? > >> From: noble.p...@corp.aol.com >> Date: Thu, 10 Sep 2009 09:58:21 +0530 >> Subject: Re: Extract info from parent node during data import >> To: solr-user@lucene.apache.org >> >> try this >> >> add two xpaths in your forEach >> >> forEach="/document/category/item | /document/category/name" >> >> and add a field as follows >> >> > commonField="true"/> >> >> Please try it out and let me know. >> >> On Thu, Sep 10, 2009 at 7:30 AM, venn hardy wrote: >> > >> > Hello, >> > >> > >> > >> > I am using SOLR 1.4 (from nighly build) and its URLDataSource in >> > conjunction with the XPathEntityProcessor. I have successfully imported >> > XML content, but I think I may have found a limitation when it comes to >> > the commonField attribute in the DataImportHandler. >> > >> > >> > >> > Before writing my own parser to read in a whole XML document, I thought >> > I'd post the question here (since I got some great advice last time). >> > >> > >> > >> > The bulk of my content is contained within each tag. However, each >> > item has a parent called and each category has a name which I >> > would like to import. In my forEach loop I specify the >> > /document/category/item as the collection of items I am interested in. Is >> > there anyway to extract an element from underneath a parent node? To be a >> > more more specific (see eg xml below). I would like to index the following: >> > >> > - category: Category 1; id: 1; author: Author 1 >> > >> > - category: Category 1; id: 2; author: Author 2 >> > >> > - category: Category 2; id: 3; author: Author 3 >> > >> > - category: Category 2; id: 4; author: Author 4 >> > >> > >> > >> > Any ideas on how I can get to a parent node from within a child during >> > data import? If it cant be done, what do you suggest would be the best way >> > so I can keep using the DataImportHandler... would XSLT be a good idea to >> > 'flatten out' the structure a bit? >> > >> > >> > >> > Thanks >> > >> > >> > >> > This is what my XML document looks like: >> > >> > >> > >> > Category 1 >> > >> > 1 >> > Author 1 >> > >> > >> > 2 >> > Author 2 >> > >> > >> > >> > Category 2 >> > >> > 3 >> > Author 3 >> > >> > >> > 4 >> > Author 4 >> > >> > >> > >> > >> > >> > >> > And this is what my dataConfig looks like: >> > >> > >> > >> > > > url="http://localhost:9080/data/20090817070752.xml"; >> > processor="XPathEntityProcessor" forEach="/document/category/item" >> > transformer="DateFormatTransformer" stream="true" dataSource="dataSource"> >> >> > commonField="true" /> >> > >> > >> > >> > >> > >> > >> > >> > >> > This is how I have specified my schema >> > >> > > > required="true" /> >> > >> > >> > >> > >> > id >> > id >> > >> > >> > >> > >> > >> > >> > _ >> > Need a place to rent, buy or share? Let us find your next place for you! >> > http://clk.atdmt.com/NMN/go/157631292/direct/01/ >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com > >_ >Get Hotmail on your iPhone Find out how here >http://windowslive.ninemsn.com.au/article.aspx?id=845706 -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Solr http post performance seems slow - help?
I'm posting documents to Solr using http (curl) from C++/C code and am seeing approximately 3.3 - 3.4 documents per second being posted. Is this to be expected? Granted - I understand that this depends somewhat on the machine running Solr. By the way - I'm running Solr inside JBoss. I was hoping for maybe 20 or more docs/sec, and 3 or so is quite a way from that. Also, I'm posting just a single document at a time. I once tried 5 processes each posting documents, and that slowed things down considerably. Down into the multiple (5-10) seconds per document. Does anyone have suggestions on what I can try? I'll soon have better servers installed and will be splitting the indexing work from the searching - but at this point in time, I wasn't doing indexing while searching anyway. Thanks for any and all help! -Dan -- Dan A. Dickey | Senior Software Engineer Savvis 10900 Hampshire Ave. S., Bloomington, MN 55438 Office: 952.852.4803 | Fax: 952.852.4951 E-mail: dan.dic...@savvis.net
Re : Indexing fields dynamically
Thanks for the quick reply. Ok for dynamicFields but how can i rename fields during indexation/search to add suffix corresponding to the type ? What is the best way to do this? Nourredine. De : Yonik Seeley À : solr-user@lucene.apache.org Envoyé le : Jeudi, 10 Septembre 2009, 14h24mn 26s Objet : Re: Indexing fields dynamically On Thu, Sep 10, 2009 at 5:58 AM, nourredine khadri wrote: > I want to index my fields dynamically. > > DynamicFields don't suit my need because I don't know fields name in advance > and fields type must be set > dynamically too (need strong typage). This is what dynamic fields are meant for - you pick both the name and type (from a pre-defined set of types of course) at runtime. The suffix of the field name matches one of the dynamic fields and essentially picks the type. -Yonik http://www.lucidimagination.com
Re: Extract info from parent node during data import
in my tests both seems to be working. I had misspelt the column as "catgoryname" is that why? keep in mind that you get extra docs for each "category" also On Thu, Sep 10, 2009 at 5:53 PM, venn hardy wrote: > > Hi Paul, > The forEach="/document/category/item | /document/category/name" didn't work > (no categoryname was stored or indexed). > However forEach="/document/category/item | /document/category" seems to work > well. I am not sure why category on its own works, but not category/name... > But thanks for tip. It wasn't as painful as I thought it would be. > Venn > >> From: noble.p...@corp.aol.com >> Date: Thu, 10 Sep 2009 09:58:21 +0530 >> Subject: Re: Extract info from parent node during data import >> To: solr-user@lucene.apache.org >> >> try this >> >> add two xpaths in your forEach >> >> forEach="/document/category/item | /document/category/name" >> >> and add a field as follows >> >> > commonField="true"/> >> >> Please try it out and let me know. >> >> On Thu, Sep 10, 2009 at 7:30 AM, venn hardy wrote: >> > >> > Hello, >> > >> > >> > >> > I am using SOLR 1.4 (from nighly build) and its URLDataSource in >> > conjunction with the XPathEntityProcessor. I have successfully imported >> > XML content, but I think I may have found a limitation when it comes to >> > the commonField attribute in the DataImportHandler. >> > >> > >> > >> > Before writing my own parser to read in a whole XML document, I thought >> > I'd post the question here (since I got some great advice last time). >> > >> > >> > >> > The bulk of my content is contained within each tag. However, each >> > item has a parent called and each category has a name which I >> > would like to import. In my forEach loop I specify the >> > /document/category/item as the collection of items I am interested in. Is >> > there anyway to extract an element from underneath a parent node? To be a >> > more more specific (see eg xml below). I would like to index the following: >> > >> > - category: Category 1; id: 1; author: Author 1 >> > >> > - category: Category 1; id: 2; author: Author 2 >> > >> > - category: Category 2; id: 3; author: Author 3 >> > >> > - category: Category 2; id: 4; author: Author 4 >> > >> > >> > >> > Any ideas on how I can get to a parent node from within a child during >> > data import? If it cant be done, what do you suggest would be the best way >> > so I can keep using the DataImportHandler... would XSLT be a good idea to >> > 'flatten out' the structure a bit? >> > >> > >> > >> > Thanks >> > >> > >> > >> > This is what my XML document looks like: >> > >> > >> > >> > Category 1 >> > >> > 1 >> > Author 1 >> > >> > >> > 2 >> > Author 2 >> > >> > >> > >> > Category 2 >> > >> > 3 >> > Author 3 >> > >> > >> > 4 >> > Author 4 >> > >> > >> > >> > >> > >> > >> > And this is what my dataConfig looks like: >> > >> > >> > >> > > > url="http://localhost:9080/data/20090817070752.xml"; >> > processor="XPathEntityProcessor" forEach="/document/category/item" >> > transformer="DateFormatTransformer" stream="true" dataSource="dataSource"> >> > > > commonField="true" /> >> > >> > >> > >> > >> > >> > >> > >> > >> > This is how I have specified my schema >> > >> > > > required="true" /> >> > >> > >> > >> > >> > id >> > id >> > >> > >> > >> > >> > >> > >> > _ >> > Need a place to rent, buy or share? Let us find your next place for you! >> > http://clk.atdmt.com/NMN/go/157631292/direct/01/ >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com > > _ > Get Hotmail on your iPhone Find out how here > http://windowslive.ninemsn.com.au/article.aspx?id=845706 -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re : Re : Re : Pb using delta import with XPathEntityProcessor
Some fields are null but not the one parsed by XPathEntityProcessor (named XML) 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.LogTransformer transformRow FIN: Map content : {KEYWORDS=pub, SPECIFIC=null, FATHERSID=, CONTAINERID=, ARCHIVEDDATE=0, SITE=12308, LANGUAGE=null, ARCHIVESTATE=false, OFFLINEATDATE=0, ONLINEATDATE=1026307864230, STATUS=0, DATESTATUS=1113905585726, MODEL=0, ACTIVATIONSTATE=true, MOUNTED_SITE_IDS=null, SPECIFIC_XML=null, PUBLICATIONSTATE=true, XML= Empty Subtitle - Click Here to edit Empty Title - Click Here to edit Empty Chap¶ - Click Here to edit Empty Autor - Click Here to edit Empty Catchword - Click Here to edit Empty InterTitle - Cl ick Here to edit TextEmpty Paragraph - Click Here to edit Text , IDENTIFIERVERSION=5040052, CONTENTID=5040052} 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DocBuilder buildDocument GRAVE: Exception while processing: xml_document document : SolrInputDocument[{keywords=keywords(1.0)={pub}, fathersId=fathersId(1.0)={}, containerId=containerId(1.0)={}, site=site(1.0)={12308}, archiveState=archiveState(1.0)={false}, offlineAtDate=offlineAtDate(1.0)={0}, onlineAtDate=onlineAtDate(1.0)={1026307864230}, status=status(1.0)={0}, dateStatus=dateStatus(1.0)={1113905585726}, model=model(1.0)={0}, activationState=activationState(1.0)={true}, publicationState=publicationState(1.0)={true}, xml=xml(1.0)={ Empty Subtitle - Click Here to edit Empty Title - Click Here to edit Empty Chap¶ - Click Here to edit< /Parag> Empty Autor - Click Here to edit Empty Catchword - Click Here to edit Empty InterTitle - Click Here to edit TextEmpty Paragraph - Click Here to edit Text }, identifierversion=identifierversion(1.0)={5040052}, contentid=contentid(1.0)={5040052}}] org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:null rows processed:0 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282) ... 10 more Caused by: java.lang.NullPointerException at com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245) at com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132) at com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543) at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604) at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660) at com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331) at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88) ... 11 more 10 sept. 2009 14:40:34 org.apache.solr.handler.dataimport.DataImporter doDeltaImport GRAVE: Delta Import Failed org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:null rows processed:0 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.h
Re: Re : Re : Pb using delta import with XPathEntityProcessor
what do you see if you keep the logTemplate="${document}". I'm trying to figure out the contents of the map
Re: How to Convert Lucene index files to XML Format
Thanks for your reply > On Sep 10, 2009, at 6:41 AM, busbus wrote: > Solr defers to Lucene on reading the index. You just need to tell > Solr whether the index is a compound file or not and make sure the > versions are compatible. > This part seems to be the point. How to make solr to read lucene index files. There is a tag in Solrconfig.xml false Enable it to true does not seem to be working. What else need to be done. Should i change the config file or add new tag. Also how to check the compatibility of Lucen and solr Thanks in advance -- View this message in context: http://www.nabble.com/How-to-Convert-Lucene-index-files-to-XML-Format-tp25381017p25382367.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing fields dynamically
On Thu, Sep 10, 2009 at 5:58 AM, nourredine khadri wrote: > I want to index my fields dynamically. > > DynamicFields don't suit my need because I don't know fields name in advance > and fields type must be set > dynamically too (need strong typage). This is what dynamic fields are meant for - you pick both the name and type (from a pre-defined set of types of course) at runtime. The suffix of the field name matches one of the dynamic fields and essentially picks the type. -Yonik http://www.lucidimagination.com
Re: Solr: ERRORs at Startup
Hi Giovanni, i am facing same issue. Can you share some info on how you solved this puzzle.. hossman wrote: > > > : Even setting everything to INFO through > : http://localhost:8080/solr/admin/logging didn't help. > : > : But considering you do not see any bad issue here, at this time I will > : ignore those ERROR messages :-) > > i would read up more on how to configure logging in JBoss. > > as far as i can tell, Solr is logging messages, which are getting handled > by a logger that writes them to STDERR using a fairly standard format > (date, class, method, level, msg) ... except some other piece of code > seems to be reading from STDERR, and assuming anything that got written > there is an ERROR, so it's loging those writes to stderr using a format > with a date, a level (of ERROR), and a group or some other identifier of > "STDERR" > > the problem is if you ignore them completely, you're going to miss > noticing when you really have a problem. > > Like i said: figure out how to configure logging in JBoss, you might need > to change the slf4j adapater jar or something if it can't deal with JUL > (which is the default). > > : >> 10:51:20,525 INFO [TomcatDeployment] deploy, ctxPath=/solr > : >> 10:51:20,617 ERROR [STDERR] Mar 13, 2009 10:51:20 AM > : >> org.apache.solr.servlet.SolrDispatchFilter init > : >> INFO: SolrDispatchFilter.init() > > > > -Hoss > > > -- View this message in context: http://www.nabble.com/Solr%3A-ERRORs-at-Startup-tp22493300p25382340.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Extract info from parent node during data import
Hi Paul, The forEach="/document/category/item | /document/category/name" didn't work (no categoryname was stored or indexed). However forEach="/document/category/item | /document/category" seems to work well. I am not sure why category on its own works, but not category/name... But thanks for tip. It wasn't as painful as I thought it would be. Venn > From: noble.p...@corp.aol.com > Date: Thu, 10 Sep 2009 09:58:21 +0530 > Subject: Re: Extract info from parent node during data import > To: solr-user@lucene.apache.org > > try this > > add two xpaths in your forEach > > forEach="/document/category/item | /document/category/name" > > and add a field as follows > > commonField="true"/> > > Please try it out and let me know. > > On Thu, Sep 10, 2009 at 7:30 AM, venn hardy wrote: > > > > Hello, > > > > > > > > I am using SOLR 1.4 (from nighly build) and its URLDataSource in > > conjunction with the XPathEntityProcessor. I have successfully imported XML > > content, but I think I may have found a limitation when it comes to the > > commonField attribute in the DataImportHandler. > > > > > > > > Before writing my own parser to read in a whole XML document, I thought I'd > > post the question here (since I got some great advice last time). > > > > > > > > The bulk of my content is contained within each tag. However, each > > item has a parent called and each category has a name which I > > would like to import. In my forEach loop I specify the > > /document/category/item as the collection of items I am interested in. Is > > there anyway to extract an element from underneath a parent node? To be a > > more more specific (see eg xml below). I would like to index the following: > > > > - category: Category 1; id: 1; author: Author 1 > > > > - category: Category 1; id: 2; author: Author 2 > > > > - category: Category 2; id: 3; author: Author 3 > > > > - category: Category 2; id: 4; author: Author 4 > > > > > > > > Any ideas on how I can get to a parent node from within a child during data > > import? If it cant be done, what do you suggest would be the best way so I > > can keep using the DataImportHandler... would XSLT be a good idea to > > 'flatten out' the structure a bit? > > > > > > > > Thanks > > > > > > > > This is what my XML document looks like: > > > > > > > > Category 1 > > > > 1 > > Author 1 > > > > > > 2 > > Author 2 > > > > > > > > Category 2 > > > > 3 > > Author 3 > > > > > > 4 > > Author 4 > > > > > > > > > > > > > > And this is what my dataConfig looks like: > > > > > > > >> url="http://localhost:9080/data/20090817070752.xml"; > > processor="XPathEntityProcessor" forEach="/document/category/item" > > transformer="DateFormatTransformer" stream="true" dataSource="dataSource"> > > > commonField="true" /> > > > > > > > > > > > > > > > > > > This is how I have specified my schema > > > >> required="true" /> > > > > > > > > > > id > > id > > > > > > > > > > > > > > _ > > Need a place to rent, buy or share? Let us find your next place for you! > > http://clk.atdmt.com/NMN/go/157631292/direct/01/ > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com _ Get Hotmail on your iPhone Find out how here http://windowslive.ninemsn.com.au/article.aspx?id=845706
Re : Re : Pb using delta import with XPathEntityProcessor
That's the case. The field is not null. 10 sept. 2009 14:10:54 org.apache.solr.handler.dataimport.LogTransformer transformRow FIN: id : 5040052 - Xml content : Empty Subtitle - Click Here to edit Empty Title - Click Here to edit Empty Chap¶ - Click Here to edit Empty Autor - Click Here to edit Empty Catchword - Click Here to edit Empty InterTitle - Click Here to edit TextEmpty Paragraph - Click Here to edit Text 10 sept. 2009 14:10:54 org.apache.solr.handler.dataimport.DocBuilder buildDocument GRAVE: Exception while processing: xml_document document : SolrInputDocument[{keywords=keywords(1.0)={pub}, fathersId=fathersId(1.0)={}, containerId=containerId(1.0)={}, site=site(1.0)={12308}, archiveState=archiveState(1.0)={false}, offlineAtDate=offlineAtDate(1.0)={0}, onlineAtDate=onlineAtDate(1.0)={1026307864230}, status=status(1.0)={0}, dateStatus=dateStatus(1.0)={1113905585726}, model=model(1.0)={0}, activationState=activationState(1.0)={true}, publicationState=publicationState(1.0)={true}, xml=xml(1.0)={ Empty Subtitle - Click Here to edit Empty Title - Click Here to edit Empty Chap¶ - Click Here to edit< /Parag> Empty Autor - Click Here to edit Empty Catchword - Click Here to edit Empty InterTitle - Click Here to edit TextEmpty Paragraph - Click Here to edit Text }, identifierversion=identifierversion(1.0)={5040052}, contentid=contentid(1.0)={5040052}}] org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:null rows processed:0 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282) ... 10 more Caused by: java.lang.NullPointerException at com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245) at com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132) at com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543) at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604) at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660) at com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331) at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88) ... 11 more 10 sept. 2009 14:10:54 org.apache.solr.handler.dataimport.DataImporter doDeltaImport GRAVE: Delta Import Failed org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed for xml, url:null rows processed:0 Processing Document # 1 at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimpor
Re: How to Convert Lucene index files to XML Format
On Sep 10, 2009, at 6:41 AM, busbus wrote: Hello All, I have a set of Files indexed by Lucene. Now i want to use the indexed files in SOLR. The file .cfx an .cfs are not readable by Solr, as it supports only .fds and .fdx. Solr defers to Lucene on reading the index. You just need to tell Solr whether the index is a compound file or not and make sure the versions are compatible. What error are you getting? So i decided to Add/update the index by just loading a XML File using the post.jar funtion. java -jar post.jar newFile.XML - Loads the XML and Updates the index. Now i want to Convert all the cfx files to XML so that i can Use them in SOLR. Advice Needed. I suppose you could walk the documents and dump them out to XML, assuming you have stored all your fields. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Connection refused when excecuting the query
On Thu, Sep 10, 2009 at 4:52 PM, dharhsana wrote: > > Hi to all, > when i try to execute my query i get Connection refused ,can any one please > tell me what should be done for this ,to make my solr run. > > org.apache.solr.client.solrj.SolrServerException: > java.net.ConnectException: > Connection refused: connect > org.apache.solr.client.solrj.SolrServerException: > java.net.ConnectException: > Connection refused: connect >at > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:471) >at > Your Solr server is not running at the url you have given to CommonsHttpSolrServer. Make sure you have given the correct url and Solr is actually up and running at that url. -- Regards, Shalin Shekhar Mangar.
Connection refused when excecuting the query
Hi to all, when i try to execute my query i get Connection refused ,can any one please tell me what should be done for this ,to make my solr run. org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: Connection refused: connect org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: Connection refused: connect at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:471) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at com.cloud.seviceImpl.InsertToSolrServiceImpl.getMyBlogs(InsertToSolrServiceImpl.java:214) at com.cloud.struts.action.MyBlogAction.execute(MyBlogAction.java:42) at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:425) at org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:228) at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1913) at org.apache.struts.action.ActionServlet.doGet(ActionServlet.java:449) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:567) at org.apache.catalina.authenticator.SingleSignOn.invoke(SingleSignOn.java:394) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:595) Caused by: java.net.ConnectException: Connection refused: connect at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:519) at java.net.Socket.connect(Socket.java:469) at java.net.Socket.(Socket.java:366) at java.net.Socket.(Socket.java:239) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:80) at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.createSocket(DefaultProtocolSocketFactory.java:122) at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:415) with regrards, rekha -- View this message in context: http://www.nabble.com/Connection-refused-when-excecuting-the-query-tp25381486p25381486.html Sent from the Solr - User mailing list archive at Nabble.com.
How to Convert Lucene index files to XML Format
Hello All, I have a set of Files indexed by Lucene. Now i want to use the indexed files in SOLR. The file .cfx an .cfs are not readable by Solr, as it supports only .fds and .fdx. So i decided to Add/update the index by just loading a XML File using the post.jar funtion. java -jar post.jar newFile.XML - Loads the XML and Updates the index. Now i want to Convert all the cfx files to XML so that i can Use them in SOLR. Advice Needed. Any other suggestions are most welcomed. - Balaji -- View this message in context: http://www.nabble.com/How-to-Convert-Lucene-index-files-to-XML-Format-tp25381017p25381017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re : Pb using delta import with XPathEntityProcessor
can your just confirm that the field is not null byadding in a LogTransformer to the entity "document" On Thu, Sep 10, 2009 at 3:54 PM, nourredine khadri wrote: > But why that occurs only for delta import and not for the full ? > > I've checked my data : no xml field is null. > > Nourredine. > > Noble Paul wrote : >> >>I guess there was a null field and the xml parser blows up > > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Pb using delta import with XPathEntityProcessor
I just committed the fix https://issues.apache.org/jira/browse/SOLR-1420 But it does not solve your problem , it will just prevent the import from throwing an exception and fail 2009/9/10 Noble Paul നോബിള് नोब्ळ् : > I guess there was a null field and the xml parser blows up > > > On Thu, Sep 10, 2009 at 3:06 PM, nourredine khadri > wrote: >> Hi, >> >> I'm new solR user and for the moment it suits almost all my needs :) >> >> I use a fresh nightly release (09/2009) and I index a >> database table using dataImportHandler. >> >> I try to parse an xml content field from this table using >> XPathEntityProcessor >> and FieldReaderDataSource. Everything works fine for the full-import. >> >> But when I try to use the delta import (i need incremental indexation) using >> "deltaQuery" >> and "deltaImportQuery", it does not work and i have a stack for each >> field : >> >> 10 sept. 2009 11:12:26 >> org.apache.solr.handler.dataimport.XPathEntityProcessor initQuery >> ATTENTION: Parsing failed for xml, url:null rows processed:0 >> java.lang.RuntimeException: java.lang.NullPointerException >> at >> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92) >> at >> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282) >> at >> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) >> at >> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365) >> at >> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259) >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) >> at >> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354) >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395) >> at >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) >> Caused by: java.lang.NullPointerException >> at >> com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245) >> at >> com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132) >> at >> com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543) >> at >> com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604) >> at >> com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660) >> at >> com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331) >> at >> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88) >> ... 11 more >> >> >> When I remove the "delta" queries or the XPathEntityProcessor block , it's >> ok. >> >> my data-config.xml : >> >> >> > name="database" >> type="JdbcDataSource" >> driver="com.mysql.jdbc.Driver" >> url="jdbc:mysql://xxx" >> user="xxx" >> password="xxx"/> >> > type="FieldReaderDataSource" name="fieldReader"/> >> >> >> > name="document" >> dataSource="database" >> processor="SqlEntityProcessor" >> pk="CONTENTID" >> query="SELECT * FROM SEARCH" >> deltaImportQuery="SELECT * FROM SEARCH WHERE >> CONTENTID=${dataimporter.delta.CONTENTID}" >> deltaQuery="SELECT CONTENTID FROM SEARCH WHERE DATESTATUS >>>= UNIX_TIMESTAMP('${dataimporter.last_index_time}')"> >> >> > name="xml_contenu" >> dataSource="fieldReader" >> processor="XPathEntityProcessor" >> forEach="/Contenu" >> dataField="document.XML" >> onError="continue"> >> > column="SurTitre" xpath="/Contenu/ArtCourt/SurTitre" >> flatten="true"/> >> > column="Titre" xpath="/Contenu/ArtCourt/Titre" >> flatten="true"/> >> > column="Chapeau" xpath="/Contenu/ArtCourt/Chapeau" >> flatten="true"/> >> > column="Auteur" xpath="/Contenu/ArtCourt/AuteurW" >> flatten="true"/> >> > column="Accroche" xpath="/Contenu/ArtCourt/Accroche" >> flatten="true"/> >> > column="TxtCourt" xpath="/Contenu/ArtCourt/TxtCourt" >> flatten="true"/> >> > column="Refs" xpath="/Contenu/ArtCourt/Refs" >> flatten="true"/> >> >> >> >> >> >> >> >> the server query >> :http://localhost:8080/apache-solr-nightly/dataimport?command=delta-import >> >> All fields are declared in the shema.xml >> >> Can someone help me? >> >> Nourredine >> >> >> > > > > -- > - > Noble Paul | Principal
Re : Pb using delta import with XPathEntityProcessor
But why that occurs only for delta import and not for the full ? I've checked my data : no xml field is null. Nourredine. Noble Paul wrote : > >I guess there was a null field and the xml parser blows up
Re: Pb using delta import with XPathEntityProcessor
I guess there was a null field and the xml parser blows up On Thu, Sep 10, 2009 at 3:06 PM, nourredine khadri wrote: > Hi, > > I'm new solR user and for the moment it suits almost all my needs :) > > I use a fresh nightly release (09/2009) and I index a > database table using dataImportHandler. > > I try to parse an xml content field from this table using XPathEntityProcessor > and FieldReaderDataSource. Everything works fine for the full-import. > > But when I try to use the delta import (i need incremental indexation) using > "deltaQuery" > and "deltaImportQuery", it does not work and i have a stack for each > field : > > 10 sept. 2009 11:12:26 > org.apache.solr.handler.dataimport.XPathEntityProcessor initQuery > ATTENTION: Parsing failed for xml, url:null rows processed:0 > java.lang.RuntimeException: java.lang.NullPointerException > at > org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365) > at > org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) > at > org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) > Caused by: java.lang.NullPointerException > at > com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245) > at > com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132) > at > com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543) > at > com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604) > at > com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660) > at > com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331) > at > org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88) > ... 11 more > > > When I remove the "delta" queries or the XPathEntityProcessor block , it's ok. > > my data-config.xml : > > > name="database" > type="JdbcDataSource" > driver="com.mysql.jdbc.Driver" > url="jdbc:mysql://xxx" > user="xxx" > password="xxx"/> > type="FieldReaderDataSource" name="fieldReader"/> > > > name="document" > dataSource="database" > processor="SqlEntityProcessor" > pk="CONTENTID" > query="SELECT * FROM SEARCH" > deltaImportQuery="SELECT * FROM SEARCH WHERE > CONTENTID=${dataimporter.delta.CONTENTID}" > deltaQuery="SELECT CONTENTID FROM SEARCH WHERE DATESTATUS >>= UNIX_TIMESTAMP('${dataimporter.last_index_time}')"> > > name="xml_contenu" > dataSource="fieldReader" > processor="XPathEntityProcessor" > forEach="/Contenu" > dataField="document.XML" > onError="continue"> > column="SurTitre" xpath="/Contenu/ArtCourt/SurTitre" > flatten="true"/> > column="Titre" xpath="/Contenu/ArtCourt/Titre" > flatten="true"/> > column="Chapeau" xpath="/Contenu/ArtCourt/Chapeau" > flatten="true"/> > column="Auteur" xpath="/Contenu/ArtCourt/AuteurW" > flatten="true"/> > column="Accroche" xpath="/Contenu/ArtCourt/Accroche" > flatten="true"/> > column="TxtCourt" xpath="/Contenu/ArtCourt/TxtCourt" > flatten="true"/> > column="Refs" xpath="/Contenu/ArtCourt/Refs" > flatten="true"/> > > > > > > > > the server query > :http://localhost:8080/apache-solr-nightly/dataimport?command=delta-import > > All fields are declared in the shema.xml > > Can someone help me? > > Nourredine > > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Indexing fields dynamically
Hello, I want to index my fields dynamically. DynamicFields don't suit my need because I don't know fields name in advance and fields type must be set dynamically too (need strong typage). I think the solution is to handle this programmatically but what is the best way to do this? Which custom handler and api use ? Nourredine.
Re: WebLogic 10 Compatibility Issue - StackOverflowError
In testing Solr 1.4 today with Weblogic it looks like the filters issue still exists. Adding the appropriate entries in weblogic.xml still resolves it. On first look, the header.jsp changes dont appear to be required anymore. Would it make sense to include a weblogic.xml in the distribution to disable the filters or should this be an exercise for users/administrators who chose to deploy this under weblogic? On 2/3/09 10:26 PM, Ilan Rabinovitch wrote: We believe that the filters/forward issue is likely something specific to weblogic. Specifically that other containers have filters disabled on forward by default, where as weblogic has them enabled. We dont think the small modification we had to make to headers.jsp are weblogic specific. On 1/30/09 8:15 AM, Feak, Todd wrote: Are the issues ran into due to non-standard code in Solr, or is there some WebLogic inconsistency? -Todd Feak -Original Message- From: news [mailto:n...@ger.gmane.org] On Behalf Of Ilan Rabinovitch Sent: Friday, January 30, 2009 1:11 AM To: solr-user@lucene.apache.org Subject: Re: WebLogic 10 Compatibility Issue - StackOverflowError I created a wiki page shortly after posting to the list: http://wiki.apache.org/solr/SolrWeblogic From what we could tell Solr itself was fully functional, it was only the admin tools that were failing. Regards, Ilan Rabinovitch --- SCALE 7x: 2009 Southern California Linux Expo Los Angeles, CA http://www.socallinuxexpo.org On 1/29/09 4:34 AM, Mark Miller wrote: We should get this on the wiki. - Mark Ilan Rabinovitch wrote: We were able to deploy Solr 1.3 on Weblogic 10.0 earlier today. Doing so required two changes: 1) Creating a weblogic.xml file in solr.war's WEB-INF directory. The weblogic.xml file is required to disable Solr's filter on FORWARD. The contents of weblogic.xml should be: http://www.bea.com/ns/weblogic/90"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://www.bea.com/ns/weblogic/90 http://www.bea.com/ns/weblogic/90/weblogic-web-app.xsd";> false 2) Remove the pageEncoding attribute from line 1 of solr/admin/header.jsp On 1/17/09 2:02 PM, KSY wrote: I hit a major roadblock while trying to get Solr 1.3 running on WebLogic 10.0. A similar message was posted before - ( http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin- page-td20157873.html http://www.nabble.com/Solr-1.3-stack-overflow-when-accessing-solr-admin- page-td20157873.html ) - but it seems like it hasn't been resolved yet, so I'm re-posting here. I am sure I configured everything correctly because it's working fine on Resin. Has anyone successfully run Solr 1.3 on WebLogic 10.0 or higher? Thanks. SUMMARY: When accessing /solr/admin page, StackOverflowError occurs due to an infinite recursion in SolrDispatchFilter ENVIRONMENT SETTING: Solr 1.3.0 WebLogic 10.0 JRockit JVM 1.5 ERROR MESSAGE: SEVERE: javax.servlet.ServletException: java.lang.StackOverflowError at weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:276) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java: 42) at weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis patcherImpl.java:526) at weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:261) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java: 42) at weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis patcherImpl.java:526) at weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:261) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java: 42) at weblogic.servlet.internal.RequestDispatcherImpl.invokeServlet(RequestDis patcherImpl.java:526) at weblogic.servlet.internal.RequestDispatcherImpl.forward(RequestDispatche rImpl.java:261) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) -- Ilan Rabinovitch i...@fonz.net --- SCALE 8x: 2010 Southern California Linux Expo Feb 19-21, 2010 Los Angeles, CA http://www.socallinuxexpo.org
Pb using delta import with XPathEntityProcessor
Hi, I'm new solR user and for the moment it suits almost all my needs :) I use a fresh nightly release (09/2009) and I index a database table using dataImportHandler. I try to parse an xml content field from this table using XPathEntityProcessor and FieldReaderDataSource. Everything works fine for the full-import. But when I try to use the delta import (i need incremental indexation) using "deltaQuery" and "deltaImportQuery", it does not work and i have a stack for each field : 10 sept. 2009 11:12:26 org.apache.solr.handler.dataimport.XPathEntityProcessor initQuery ATTENTION: Parsing failed for xml, url:null rows processed:0 java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:92) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:365) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:259) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:159) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:354) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:395) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:372) Caused by: java.lang.NullPointerException at com.ctc.wstx.io.ReaderBootstrapper.initialLoad(ReaderBootstrapper.java:245) at com.ctc.wstx.io.ReaderBootstrapper.bootstrapInput(ReaderBootstrapper.java:132) at com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:543) at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:604) at com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:660) at com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:331) at org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:88) ... 11 more When I remove the "delta" queries or the XPathEntityProcessor block , it's ok. my data-config.xml : the server query :http://localhost:8080/apache-solr-nightly/dataimport?command=delta-import All fields are declared in the shema.xml Can someone help me? Nourredine
Re: Misleading log messages while deploying solr
Thanks Hossman As per my understandings and investigations, if we disable STDERR from the jboss configs, we will not be able to see any STDERR coming from any of the APIs - which can be real error messages. So if we know the exact reason why this message from solr is showing up, we can block this at solr level or may be jboss level. Any suggestion which points out a reason for this or a solution that hides these messages only is really appreciable. thanks hossman wrote: > > > : But the log message that is getting print in the server console, in my > case > : jboss, is showing status as error. > : Why is this showing as ERROR, even though things are working fine. > > Solr is not declaring that those messages are ERRORs, solr is just logging > informational messages (hence then "INFO" lines) using the java logging > framework. > > My guess: since the logs are getting prefixed with "ERROR [STDERR]" > something about the way your jboss container is configured is probably > causing those log messages to be written to STDERR, and then jboss is > capturing the STDERR and assuming that if it went there it mist be an > "ERROR" of some kind and logging it to the console (using it's own log > format, hence the touble timestamps per line message) > > In short: jboss is doing this in response to normal logging from solr. > you should investigate your options for configuriring jboss and how it > deals with log messages from applications. > > > : 11:41:19,030 INFO [TomcatDeployer] deploy, ctxPath=/solr, > : warUrl=.../tmp/deploy/tmp43266solr-exp.war/ > : 11:41:19,948 ERROR [STDERR] 8 Sep, 2009 11:41:19 AM > : org.apache.solr.servlet.SolrDispatchFilter init > : INFO: SolrDispatchFilter.init() > : 11:41:19,975 ERROR [STDERR] 8 Sep, 2009 11:41:19 AM > : org.apache.solr.core.SolrResourceLoader locateInstanceDir > : INFO: No /solr/home in JNDI > : 11:41:19,976 ERROR [STDERR] 8 Sep, 2009 11:41:19 AM > : org.apache.solr.core.SolrResourceLoader locateInstanceDir > : INFO: using system property solr.solr.home: C:\app\Search > : 11:41:19,984 ERROR [STDERR] 8 Sep, 2009 11:41:19 AM > : org.apache.solr.core.CoreContainer$Initializer initialize > : INFO: looking for solr.xml: C:\app\Search\solr.xml > : 11:41:20,084 ERROR [STDERR] 8 Sep, 2009 11:41:20 AM > : org.apache.solr.core.SolrResourceLoader > : INFO: Solr home set to 'C:\app\Search' > : 11:41:20,142 ERROR [STDERR] 8 Sep, 2009 11:41:20 AM > : org.apache.solr.core.SolrResourceLoader createClassLoader > : INFO: Adding > : 'file:/C:/app/Search/lib/apache-solr-dataimporthandler-1.3.0.jar' to > Solr > : classloader > : 11:41:20,144 ERROR [STDERR] 8 Sep, 2009 11:41:20 AM > : org.apache.solr.core.SolrResourceLoader createClassLoader > : INFO: Adding 'file:/C:/app/Search/lib/jsp-2.1/' to Solr classloader > : > : ... > : INFO: Reusing parent classloader > : 11:41:21,870 ERROR [STDERR] 8 Sep, 2009 11:41:21 AM > : org.apache.solr.core.SolrConfig > : INFO: Loaded SolrConfig: solrconfig.xml > : 11:41:21,909 ERROR [STDERR] 8 Sep, 2009 11:41:21 AM > : org.apache.solr.schema.IndexSchema readSchema > : INFO: Reading Solr Schema > : 11:41:22,092 ERROR [STDERR] 8 Sep, 2009 11:41:22 AM > : org.apache.solr.schema.IndexSchema readSchema > : INFO: Schema name=contacts schema > : 11:41:22,121 ERROR [STDERR] 8 Sep, 2009 11:41:22 AM > : org.apache.solr.util.plugin.AbstractPluginLoader load > : INFO: created string: org.apache.solr.schema.StrField > : > : . > : -- > : View this message in context: > http://www.nabble.com/Misleading-log-messages-while-deploying-solr-tp25354654p25354654.html > : Sent from the Solr - User mailing list archive at Nabble.com. > : > > > > -Hoss > > > -- View this message in context: http://www.nabble.com/Misleading-log-messages-while-deploying-solr-tp25354654p25379937.html Sent from the Solr - User mailing list archive at Nabble.com.
Does MoreLikeThis support sharding?
Hi, I tried MoreLikeThis (StandardRequestHandler with mlt arguments) with a single solr server and it works fine. However, when I tried the same query with sharded servers, I don't get the moreLikeThis key in the results. So my question is, Is MoreLikeThis with StandardRequestHandler supported on shards? If not, is MoreLikeThisHandler supported? Thanks, Jack
Re: Solr fitting in travel site context?
I'd run look into faceting and run a test. Create a schema, index the data and then run a query for *:* facteted by hotel to get a list of all the hotels you want followed by a query that returns all documents matching that hotel for your 2nd usecase. You're probably still going to want a SQL database to catch the reservations made tho. in my experience implementing Solr is more work then implementing a normal SQL database, and loosing the relational part of a relational database is something you have to wrap your head around to see how it affects your application. That said solr on my 4 year old single core laptop outperforms our new dual xeon database server running IBM DB2 when it comes to running a query on a 10 million record dataset and retuning the total amount of documents that match. Once you get it up and running properly and you need querys that are like "give me the total number of documents that match these criteria, optionally facted by this and that" it's amazingly fast. Note that this advantage only becomes apparent when dealing with large data sets. anything under a coulpe 100k records (guideline, depends heavily on the type of record) and a normal SQL server should also be able to give you the results you need near instantly. Hope this helps ;) On Wed, Sep 9, 2009 at 5:33 PM, Carsten Kraus wrote: > Hi all, > > I'm about to develop a travel website and am wondering if Solr might fit to > be used as the search solution. > Being quite the opposite of a db guru and new to Solr, it's hard for me to > judge if for my use-case a relational db should be used in favor of Solr(or > similar indexing server). Maybe some of you guys would share their opinion > on this? > > The products being searched for would be travel packages. That is: hotel > room + flight combined into one product. > I receive the products via a csv file, where each line defines a travel > package with concrete departure/return, accommodation and price data. > > For example one csv row might represent: > Hotel Foo in Paris, flight departing 10/10/09 from London, ending 10/20/09, > mealplan Bar, pricing $300 > ..while another one might look like: > Hotel Foo in Paris, flight departing 10/10/09 from Amsterdam, ending > 10/30/09, mealplan Eggs :), pricing $400 > > Now searches should show results in 2 steps: first step showing results > grouped by hotel(so no hotel appears twice) and second one all > date-airport-mealplan combinations for the hotel selected by the user in > step 1. > > From some first little tests, it seems to me as if I at least would need > the > collapse patch(SOLR-236) to be used in step 1 above?! > > What do you think? Does Solr fit into this scenario? Thoughts? > > Sorry for the lengthy post & thanks a lot for any pointer! > Carsten >