Re: Date Facet Giving Count more than actual
Thanks Hoss, the problem is resolved. The real problem was my query parameter. I was storing daysForFilter with offset of 1 sec, and date in query parameter "facet.date.start" also had same offset. This was causing the overlaps, as in the facet value of 2009-10-23T18:30:01 was matching both 2009-10-23T18:30:01 and 2009-10-24T18:30:01. just changing the query to "q=&facet=true&facet.date=daysForFilter&facet.date.start=2009-10-23T18:30:00Z&facet.date.gap=%2B1DAY&facet.date.end=2009-10-28T18:30:00Z" works. thanks any way. regards, aakash. On Tue, Nov 3, 2009 at 9:43 PM, Chris Hostetter wrote: > > : > q=&facet=true&facet.date=daysForFilter&facet.date.start=2009-10-23T18:30:01Z&facet.date.gap=%2B1DAY&facet.date.end=2009-10-28T18:30:01Z > > : For example I get total 18 documents for my query, and the facet count > for > : date 2009-10-23T18:30:01Z is 11; whereas there are only 5 documents > : containing this field value. I have verified this in result. Also when I > : query for daysForFilter:2009-10-23T18:30:01Z, it gives me 5 results. > > I think you are missunderstanding what date faceting does. you have a > facet.date.gap of +1DAY, which means the facet count is anything between > 2009-10-23T18:30:01Z and 2009-10-24T18:30:01Z inclusively. you can verify > this using a range query (not a term query) ... > > daysForFilter:[2009-10-23T18:30:01Z TO 2009-10-23T18:30:01Z+1DAY] > > if you only want to facet on a unique moment in time (not a range) then > you cna use facet.query ... or you can set the facet gap smaller. > > you should also take a look at facet.date.hardend... > http://wiki.apache.org/solr/SimpleFacetParameters#facet.date.hardend > > > -Hoss > >
ERROR: multiple values encountered for non multiValued copy field
Hi, I'm using solr with solrj and when I specify a field to copy in my schema it stops working with the exception: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.common.SolrException: ERROR: multiple values encountered for non multiValued copy field all: at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:161) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) at org.apache.solr.client.solrj.SolrServer.addBean(SolrServer.java:67) My field 'all' is defined as follows: Those fields are: If I remove the everything works fine. Any hint? TIA -- Cheers, Christian López Espínola
Re: leading and trailing wildcard query
> Please elaborate. What do you mean by *desrever* string? Try reading in reverse ;). Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: A. Steven Anderson > To: solr-user@lucene.apache.org > Sent: Thu, November 5, 2009 5:23:48 PM > Subject: Re: leading and trailing wildcard query > > > > > The guilt trick is not the best thing to try on public mailing lists. :) > > > > Point taken, although not my intention. I guess I have been spoiled by > quick replies and was getting to think it was a stupid question. > > Plus, I'm literally gonna get trash talk from my Oracle DBE if I can't make > this work. ;-) > > We've basically relegated Oracle to handling ingest from which we index Solr > and provide all search features. I'd hate to have to succumb to using > Oracle to service this one special query. > > > > The first thing that popped to my mind is to use 2 fields, where the second > > one contains the desrever string of the first one. > > > > Please elaborate. What do you mean by *desrever* string? > > > > The second idea is to use n-grams (if it's OK to tokenize), more > > specifically edge n-grams. > > > > Well, that's the problem. The field may have non-Latin characters that may > not have whitespace nor punctuation. > > > -- > A. Steven Anderson
Re: CPU Max Utilization
You may also want to share some sample queries, your fields definitions, and tell us how long a core remains 100% utilized. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: ba ba > To: solr-user@lucene.apache.org > Sent: Thu, November 5, 2009 9:20:13 PM > Subject: CPU Max Utilization > > Greetings, > > I'm running a solr instance with 100 million documents in it. The index is > 18 GB. > > The strange behavior I'm seeing is CPU utilization gets maxed out. I'm > running on an 8 core machine with 32 GB or ram. Every concurrent query I run > on it uses up one of the cores. So, if I am running 1 concurrent query I'm > using up the cpu of one of the cores. If I have 8 concurrent queries I'm > using up all of the cores. > > Is this normal to have such a high CPU utilization. If not, what am I doing > wrong here. The only thing I have modified is the schema.xml file to > correspond to the documents I want to store. Everything else is just using > the default values for all the config files. > > Thanks.
Re: solr query help alpha numeric and not
Avlesh, thanks those worked, for somre reason I never got your mail, found it in one of the list archives though. thanks again Joel On Nov 5, 2009, at 9:08 PM, Avlesh Singh wrote: Didn't the queries in my reply work? Cheers Avlesh On Fri, Nov 6, 2009 at 4:16 AM, Joel Nylund wrote: Hi yes its a string, in the case of a title, it can be anything, a letter a number, a symbol or a multibyte char etc. Any ideas if I wanted a query that was not a letter a-z or a number 0-9, given that its a string? thanks Joel On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote: Hi Joel, The ID is sent back as a string (instead of as an integer) in your example. Could this be the cause? - Jonathan On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote: Hi, I have a field called firstLetterTitle, this field has 1 char, it can be anything, I need help with a few queries on this char: 1.) I want all NON ALPHA and NON numbers, so any char that is not A-Z or 0-9 I tried: http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z But I get back numeric results: 9 23946447 2.) I want all only Numerics: http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209 This seems to work but just checking if its the right way. 2.) I want all only English Letters: http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z This seems to work but just checking if its the right way. thanks Joel
Re: DIH timezone offset
anyone to add this here http://wiki.apache.org/solr/DataImportHandlerFaq On Thu, Nov 5, 2009 at 8:35 PM, wrote: > """ > DIH relies on the driver to get the date. It does not do any automatic > conversion. Is it possible for the driver to give the date with the > right offset? > """ > > I have retried a full-import after setting the Java user.timezone property to > UTC and the dates import correctly. I've narrowed down the problem to the way > SQL server is returning dates. Converting it to ISO-8601 format resolves the > issue, but I had to append a 'Z' at the end of the conversion like so: > "select convert(varchar(30),datesentutc,126)+'Z' as date from table". > > Hope this is helpful to someone else. Thanks for the help. > > Mike > > > > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: specify multiple files in for DataImportHandler
You can set up multiple request handlers each with their own configuration file. For example, in addition to the config you listed you could add something like this: data-two-config.xml and so on with as many handlers as you need. -Jay http://www.lucidimagination.com On Thu, Nov 5, 2009 at 8:57 AM, javaxmlsoapdev wrote: > > class="org.apache.solr.handler.dataimport.DataImportHandler"> > > data-config.xml > > > > is there a way to list more than one files in the above > configuration? > I understand I can have multiple itself in the config but I need > to > keep two data-config files separate and still use same DIH to create one > index. > -- > View this message in context: > http://old.nabble.com/specify-multiple-files-in-%3Clst%3E-for-DataImportHandler-tp26215805p26215805.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: CPU Max Utilization
Are you requesting results by relevance or are you sorting by a field? How many results are you requesting? Are you using real user queries (with repetition) or a flat distrubution of queries? wunder On Nov 5, 2009, at 6:20 PM, ba ba wrote: Greetings, I'm running a solr instance with 100 million documents in it. The index is 18 GB. The strange behavior I'm seeing is CPU utilization gets maxed out. I'm running on an 8 core machine with 32 GB or ram. Every concurrent query I run on it uses up one of the cores. So, if I am running 1 concurrent query I'm using up the cpu of one of the cores. If I have 8 concurrent queries I'm using up all of the cores. Is this normal to have such a high CPU utilization. If not, what am I doing wrong here. The only thing I have modified is the schema.xml file to correspond to the documents I want to store. Everything else is just using the default values for all the config files. Thanks.
CPU Max Utilization
Greetings, I'm running a solr instance with 100 million documents in it. The index is 18 GB. The strange behavior I'm seeing is CPU utilization gets maxed out. I'm running on an 8 core machine with 32 GB or ram. Every concurrent query I run on it uses up one of the cores. So, if I am running 1 concurrent query I'm using up the cpu of one of the cores. If I have 8 concurrent queries I'm using up all of the cores. Is this normal to have such a high CPU utilization. If not, what am I doing wrong here. The only thing I have modified is the schema.xml file to correspond to the documents I want to store. Everything else is just using the default values for all the config files. Thanks.
Re: solr query help alpha numeric and not
Didn't the queries in my reply work? Cheers Avlesh On Fri, Nov 6, 2009 at 4:16 AM, Joel Nylund wrote: > Hi yes its a string, in the case of a title, it can be anything, a letter a > number, a symbol or a multibyte char etc. > > Any ideas if I wanted a query that was not a letter a-z or a number 0-9, > given that its a string? > > thanks > Joel > > > On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote: > > Hi Joel, >> >> The ID is sent back as a string (instead of as an integer) in your >> example. Could this be the cause? >> >> - Jonathan >> >> On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote: >> >> Hi, I have a field called firstLetterTitle, this field has 1 char, it can >>> be anything, I need help with a few queries on this char: >>> >>> 1.) I want all NON ALPHA and NON numbers, so any char that is not A-Z or >>> 0-9 >>> >>> I tried: >>> >>> >>> http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z >>> >>> But I get back numeric results: >>> >>> >>> 9 >>> 23946447 >>> >>> >>> >>> 2.) I want all only Numerics: >>> >>> http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209 >>> >>> This seems to work but just checking if its the right way. >>> >>> >>> >>> 2.) I want all only English Letters: >>> >>> http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z >>> >>> This seems to work but just checking if its the right way. >>> >>> >>> thanks >>> Joel >>> >>> >> >
Re: StreamingUpdateSolrServer - indexing process stops in a couple of hours
Seems fixed. https://issues.apache.org/jira/browse/SOLR-1543 -Yonik http://www.lucidimagination.com On Mon, Nov 2, 2009 at 6:05 AM, Shalin Shekhar Mangar wrote: > I'm able to reproduce this issue consistently using JDK 1.6.0_16 > > After an optimize is called, only one thread keeps adding documents and the > rest wait on StreamingUpdateSolrServer line 196. > > On Sun, Oct 25, 2009 at 8:03 AM, Dadasheva, Olga > wrote: > >> I am using java 1.6.0_05 >> >> To illustrate what is happening I wrote this test program that has 10 >> threads adding a collection of documents and one thread optimizing the index >> every 10 sec. >> >> I am seeing that after the first optimize there is only one thread that >> keeps adding documents. The other ones are locked. >> >> In the real code I ended up adding synchronized around add on optimize to >> avoid this. >> >> public static void main(String[] args) { >> >> final JettySolrRunner jetty = new JettySolrRunner("/solr", 8983 ); >> try { >> jetty.start(); >> // setup the server... >> String url = "http://localhost:8983/solr";; >> final StreamingUpdateSolrServer server = new >> StreamingUpdateSolrServer( url, 2, 5 ) { >> @Override >> public void handleError(Throwable ex) { >> // do somethign... >> } >> }; >> server.setConnectionTimeout(1000); >> server.setDefaultMaxConnectionsPerHost(100); >> server.setMaxTotalConnections(100); >> int i = 0; >> while (i++ < 10) { >> new Thread("add-thread"+i) { >> public void run(){ >> int j = 0; >> while (true) { >> try { >> List docs >> = new ArrayList(); >> for (int n = 0; n < 50; n++) >> { >> SolrInputDocument doc = >> new SolrInputDocument(); >> String docID = >> this.getName()+"_doc_"+j++; >> doc.addField( "id", >> docID); >> doc.addField( "content", >> "document_"+docID); >> docs.add(doc); >> } >> server.add(docs); >> >> System.out.println(this.getName()+" added "+docs.size()+" documents"); >> Thread.sleep(100); >> } catch (Exception e) { >> e.printStackTrace(); >> >> System.err.println(this.getName()+" "+e.getLocalizedMessage()); >> System.exit(0); >> } >> } >> } >> }.start(); >> } >> >> new Thread("optimizer-thread") { >> public void run(){ >> while (true) { >> try { >> Thread.sleep(1); >> server.optimize(); >> System.out.println(this.getName()+" >> optimized"); >> } catch (Exception e) { >> e.printStackTrace(); >> System.err.println("optimizer >> "+e.getLocalizedMessage()); >> System.exit(0); >> } >> } >> } >> }.start(); >> >> >> } catch (Exception e) { >> e.printStackTrace(); >> } >> >> } >> -Original Message- >> From: Lance Norskog [mailto:goks...@gmail.com] >> Sent: Tuesday, October 13, 2009 8:59 PM >> To: solr-user@lucene.apache.org >> Subject: Re: StreamingUpdateSolrServer - indexing process stops in a couple >> of hours >> >> Which Java release is this? There are known thread-blocking problems in >> Java 1.5. >> >> Also, what sockets are used during this time? Try 'netstat -s | fgrep 8983' >> (or your Solr URL port #) and watch the active, TIME_WAIT, CLOSE_WAIT >> sockets build up. This may give a hint. >> >> On Tue, Oct 13, 2009 at 8:47 AM, Dadasheva, Olga < >> olga_dadash...@harvard.edu> wrote: >> > Hi, >> > >> > I am indexing documents using StreamingUpdateSolrServer. My 'setup' >> > code is almost a copy of the junit test of the Solr trunk. >> > >> >
Re: leading and trailing wildcard query
A. Steven Anderson wrote: No thoughts on this? Really!? I would hate to admit to my Oracle DBE that Solr can't be customized to do a common query that a relational database can do. :-( On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson < a.steven.ander...@gmail.com> wrote: I've scoured the archives and JIRA , but the answer to my question is just not clear to me. With all the new Solr 1.4 features, is there any way to do a leading and trailing wildcard query on an *untokenized* field? e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx Yes, I know how expensive such a query would be, but we have the user requirement, nonetheless. If not, any suggestions on how to implement a custom solution using Solr? Using an external data structure? You can use ReversedWildcardFilterFactory that creates additional tokens (in your case, a single additional token :) ) that is reversed, _and_ also triggers the setAllowLeadingWildcards in the QueryParser - won't help much with the performance though, due to the trailing wildcard in your original query. Please see the discussion in SOLR-1321 (this will be available in 1.4 but it should be easy to patch 1.3 to use it). If you really need to support such queries efficiently you should implement a full permu-term indexing, i.e. a token filter that rotates tokens and adds all rotations (with a special marker to mark the beginning of the word), and a query plugin that detects such query terms and rotates the query term appropriately. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: field queries seem slow
Restarting Solr clears out all caching. Doing a commit used to drop all of the caches for new requests, but it no longer does this. On Linux you can clear the kernel's disk buffer cache with a special hook. You echo '1' into a /proc/something and this tells the kernel to drop its caches. Sorry, don't remember the exact command. On Thu, Nov 5, 2009 at 10:09 AM, Otis Gospodnetic wrote: > Hi, > > There is no way that I know to clear Solr's caches (query, document, filter > caches). > FIeldCache is a Lucene thing and it's also something you can't clear, as far > as I know. > > Slowness on start could be due to: > > * OS not cached the index yet (would be the case if your Solr was down for a > while and its index got displaced from the OS buffers) > * sort query run for the first time, FieldCache not populated yet > * expensive query run for the first time, its results and hits not cached in > Solr caches > > * ... > > Otis > > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message >> From: mike anderson >> To: solr-user@lucene.apache.org >> Sent: Thu, November 5, 2009 11:34:59 AM >> Subject: Re: field queries seem slow >> >> On production our servers are restarted very rarely (once a month). But this >> raises a question, what does it take to clear the cache? On my benchmarking >> platform I've been simply restarting the server as a method of starting >> fresh. Is there a cache file I could delete to make sure I'm getting >> unbiased results? Second of all, is there an internal cache for sort fields >> separate from the cache for queries and filters which has settings found in >> the solrconfig.xml file? >> >> I did a test as you suggested to determine if that type of query is always >> slow or just when it starts up, it seems that it is only slow when it starts >> up. However, it seems to be slow when it starts up with and without sorting. >> (I'm still trying to figure out how to do good benchmarking with one >> independent variable, so it's possible that this result is inconsistent) >> >> for reference, my query is looking like this (+/- sort field): >> >> http://10.0.20.174:8986/solr/select?mlt=false&rows=10&shards=localhost:8986/solr,localhost:8986/solr,localhost:8986/solr&q=abbrev_authors%3A%22Gallinger+S%22 >> >> I like the suggestion on date resolution, we definitely don't need second >> accuracy (which it is now), and in fact I think we'll just start stamping >> documents with year/week and then sort by that. >> >> >> thanks for all your help! >> >> Cheers, >> Mike >> >> >> >> On Wed, Nov 4, 2009 at 2:07 PM, Erick Erickson wrote: >> >> > By readers, I meant your searchers. Perhaps you were shutting >> > down your servers? >> > >> > The warming isn't to pre-load authors, it's to pre-populate, particularly, >> > sort fields. Which are then kept in caches. There is considerable >> > overhead in loading the sort field the first time you sort by it. So, >> > my question was really based on the chance that "over the >> > weekend" corresponded to "the first queries after the server >> > restarted", or "the first query after the underlying index searchers >> > were (re)opened. >> > >> > The real question comes down to whether the same form of query >> > (i.e. searching for different values on the same fields with the >> > same kind of sort) is slow all the time or just when things start up. >> > >> > How fine is the resolution for your dates? Assuming that the sorting >> > is the issue, if you are storing dates in the millisecond range, that's >> > probably 20M dates that have to be loaded to sort. You might >> > want to think about a coarser resolution if this has any relevance. >> > >> > HTH >> > Erick >> > >> > On Wed, Nov 4, 2009 at 1:54 PM, mike anderson >> > >wrote: >> > >> > > Erik, we are doing a sort by date first, and then by score. I'm not sure >> > > what you mean by readers. >> > > >> > > Since we have nearly 6M authors attached to our 20M documents I'm not >> > sure >> > > that autowarming would help that much (especially since we have very >> > little >> > > overlap in what users are searching for). But maybe it would? >> > > >> > > Lance, I was just being a bit lazy. thanks though. >> > > >> > > -mike >> > > >> > > >> > > On Mon, Nov 2, 2009 at 10:27 PM, Lance Norskog >> > wrote: >> > > >> > > > This searches author:albert and (default text field): einstein. This >> > > > may not be what you expect? >> > > > >> > > > On Mon, Nov 2, 2009 at 2:30 PM, Erick Erickson < >> > erickerick...@gmail.com> >> > > > wrote: >> > > > > H, are you sorting? And has your readers been reopened? Is the >> > > > > second query of that sort also slow? If the answer to this last >> > > question >> > > > is >> > > > > "no", >> > > > > have you tried some autowarming queries? >> > > > > >> > > > > Best >> > > > > Erick >> > > > > >> > > > > On Mon, Nov 2, 2009 at 4:34 PM, mike anderson < >>
Re: Newb Question about the TemplateTransformer
I think you need custom code for this. You can write plugins in Java, or (in Java 1.6) any of the Java-based scripting languages like JavaScript. http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer On Thu, Nov 5, 2009 at 8:54 AM, Mark Ellul wrote: > Hi Noble, > > Thanks for the response... > > My data config is below... Basically I have a List table and a Tweeter > table... > > In the document I want a field called list_members which is a csv string of > all the rows where tweeter has the particular list id. > > Do you understand what I mean? > > Regards > > Mark > > > url="jdbc:postgresql://api.tlists.com:5432/tlists" > user="tlists_dev" > password="foocarrot4" > readOnly="true" autoCommit="false" > transactionIsolation="TRANSACTION_READ_COMMITTED" > holdability="CLOSE_CURSORS_AT_COMMIT" > /> > > transformer="TemplateTransformer"> > > > > query=" select id from api_tweeter where " template="" > --> > > > > ~ > > > 2009/11/5 Noble Paul നോബിള് नोब्ळ् > >> there is no parent document or child document there is only one. >> maybe you can paste your data-config >> >> On Thu, Nov 5, 2009 at 5:45 PM, Mark Ellul wrote: >> > Hi, >> > >> > I have read on the wiki that its possibile to concatenate values using a >> > TemplateTransformer. >> > >> > Basically I have a Parent Table, and Child Table, I need to create a >> > children field (in my Parent Document) which has all the ids of the >> Parent's >> > child rows in a comma separated string. >> > >> > Is this possible with the TemplateTransformer? >> > >> > if so can you please give me an snippet? >> > >> > Thanks and Regards >> > >> > Mark >> > >> >> >> >> -- >> - >> Noble Paul | Principal Engineer| AOL | http://aol.com >> > -- Lance Norskog goks...@gmail.com
Re: DIH full-import with fetchSize(Integer.MIN_VALUE) taking long time to start processing rows
Right, a view will not help here. It is just and SQL query embedded as a virtual table, and is used to lift SQL syntax out of the DIH. InnoDB is row-level except for auto-increment operations. Ow. You could drop the indexes on the table. Each insert batch has to recalculate all indexes, so this will cut the amount of database contention. On Thu, Nov 5, 2009 at 6:47 AM, Marc Sturlese wrote: > > Hey there, > I need this funcionality because I have an indexer continuously updating my > index with delta import from a table. This table is fed by another process > that is constantly runing too. > With delta-import tehre's no problem but sometimes I need to execute > full-import. > I don't see the benefits of using a view. I think provably the best solution > would be to create a master-slave mysql structure. The process that inserts > would attack the master and the query would attack the slave. This provably > would speed up the rows processing, isn't it? > > Avlesh Singh wrote: >> >>> >>> Parallelly I have another process wich is doing lots of inserts to that >>> table (I also had it before but with less number of inserts). Could this >>> be >>> causing some bloking that makes the query take that long? In case not, >>> and >>> advice what could make to take so long until I start see rows beeing >>> processed? >>> >> Sounds scary! With innodb engine you are causing a table level lock with >> each insert (assuming your table has an auto-increment column). With >> frequent inserts you are of-course delaying the read time. >> Why would you want to do this kind of an operation in the very first >> place? >> Can't you use views for indexing? >> >> Cheers >> Avlesh >> >> On Thu, Nov 5, 2009 at 6:18 PM, Marc Sturlese >> wrote: >> >>> >>> I have been using fetchSize(Integer.MIN_VALUE) for a long time and it was >>> working perfect until now. I use MySQL, java 1.6, >>> mysql-connector-java-5.1.7-bin.jar and InoDB tables. >>> Since a month ago when the query is executed it will take a long time >>> untill >>> it starts processing the results from the resultSet. The query matches >>> about >>> 2M rows. It use to take 10 min untill rows processing started. Now it’s >>> taking about 2 hours. >>> Parallelly I have another process wich is doing lots of inserts to that >>> table (I also had it before but with less number of inserts). Could this >>> be >>> causing some bloking that makes the query take that long? In case not, >>> and >>> advice what could make to take so long until I start see rows beeing >>> processed? >>> Thanks in advance. >>> -- >>> View this message in context: >>> http://old.nabble.com/DIH-full-import-with-fetchSize%28Integer.MIN_VALUE%29-taking-long-time-to-start-processing-rows-tp26213642p26213642.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> > > -- > View this message in context: > http://old.nabble.com/DIH-full-import-with-fetchSize%28Integer.MIN_VALUE%29-taking-long-time-to-start-processing-rows-tp26213642p26215730.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Lance Norskog goks...@gmail.com
Re: Regarding to ramBufferSizeMB and mergeFactor
Hi, Jeff Newburn Thank you for you good explanations. That helps me a lot. Attachot Tuangphon On 09/11/06 0:36, "Jeff Newburn" wrote: > If I am correct the two are related but not dependent on each other. Merge > factor is used to determine how many segment files exist on disk where as > the ram buffer is to determine how often the flush to disk will happen. So > you should be able to set them independently. > -- > Jeff Newburn > Software Engineer, Zappos.com > jnewb...@zappos.com - 702-943-7562 > > >> From: ATTACHOT TUANGPHON >> Reply-To: >> Date: Thu, 05 Nov 2009 22:50:47 +0900 >> To: >> Subject: Regarding to ramBufferSizeMB and mergeFactor >> >> Hello, everybody >> >> I am a new Solr user. >> I have a question about ramBufferSizeMB and mergeFactor. >> I would like to know if I increase number of mergeFactor, do I have to >> change number of ramBufferSizeMB too? >> >> For example: >> I set mergeFactor as 30 and ramBufferSizeMB as 50 >> I would like to change mergeFactor to 50 , do I have to increase >> ramBufferSizeMB? >>
Re: leading and trailing wildcard query
> Not sure what version it was supported from, but we're on 1.3. Really!? Great answer! Thanks! -- A. Steven Anderson
RE: leading and trailing wildcard query
Not sure what version it was supported from, but we're on 1.3. bern -Original Message- From: A. Steven Anderson [mailto:a.steven.ander...@gmail.com] Sent: Friday, 6 November 2009 10:25 AM To: solr-user@lucene.apache.org Subject: Re: leading and trailing wildcard query > Hi Steve, a query such as *abc* would need the NGramFilterFactor, hence the > doubleedgytext, and would be retrievable by a query such as contains:abc. > Note that you can set the max and minimum size of strings that get indexed. > Excellent! Just to clarify though, NGramFilterFactor is a Solr 1.4 feature only, correct? -- A. Steven Anderson
Re: Set MMap in Solr
Thanks for the help. -Brad Anderson 2009/11/5 Otis Gospodnetic > To use MMapDirectory, invoke Java with the System property > org.apache.lucene.FSDirectory.class set to > org.apache.lucene.store.MMapDirectory. This will cause > FSDirectory.getDirectory(File,boolean) to return instances of this class. > > So, start your servlet container with > -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message > > From: ba ba > > To: solr-user@lucene.apache.org > > Sent: Thu, November 5, 2009 2:55:42 PM > > Subject: Set MMap in Solr > > > > Hi, > > > > I'm trying to set my default directory to MMap. I saw that this is done > by > > specifying here > > > > A DirectoryProvider plugin can be configured in solrconfig.xml with the > > following XML: > > > > > > > > > > in solrconfig.xml. > > > > This did not work for me when I put in the MMapDirectory class name. > > > > I got this information from here > > > http://issues.apache.org/jira/browse/SOLR-465?focusedCommentId=12715282#action_12715282 > > > > I'm using the latest nightly build. > > > > If anyone knows how to configure solr to use MMap, please let me know. I > > would greatly appreciate it. > > > > Thanks. > >
Re: leading and trailing wildcard query
> Note that N-grams are limited to specific string lengths. I presume that > you need to search for arbitrary strings, not just three-letter ones. > Understood, but that is a limitation that we can live with. Thanks! -- A. Steven Anderson
Re: leading and trailing wildcard query
> Ah. With that restriction, it is impossible. > If it is OK to pay Lucid to make a one-line change, you might be able to do > it. Otherwise, get ready to spend a lot of money for a search engine. > Well, now that Lucid is getting In-Q-Tel $$$, they will soon learn that officially releases are all that matters, and 12-18 month release cycles are not acceptable. ;-) -- A. Steven Anderson
Re: leading and trailing wildcard query
Note that N-grams are limited to specific string lengths. I presume that you need to search for arbitrary strings, not just three-letter ones. wunder On Nov 5, 2009, at 3:23 PM, Bernadette Houghton wrote: Hi Steve, a query such as *abc* would need the NGramFilterFactor, hence the doubleedgytext, and would be retrievable by a query such as contains:abc. Note that you can set the max and minimum size of strings that get indexed. bern -Original Message- From: A. Steven Anderson [mailto:a.steven.ander...@gmail.com] Sent: Friday, 6 November 2009 10:08 AM To: solr-user@lucene.apache.org Subject: Re: leading and trailing wildcard query Thanks for the solution, but could you elaborate on how it would find something like *abc* in a field that contains abc. Steve On Thu, Nov 5, 2009 at 5:25 PM, Bernadette Houghton < bernadette.hough...@deakin.edu.au> wrote: I've just set up something similar (much thanks to Avesh!)- maxGramSize="25" /> . . stored="false" multiValued="true"/> . . bern
Re: leading and trailing wildcard query
> Hi Steve, a query such as *abc* would need the NGramFilterFactor, hence the > doubleedgytext, and would be retrievable by a query such as contains:abc. > Note that you can set the max and minimum size of strings that get indexed. > Excellent! Just to clarify though, NGramFilterFactor is a Solr 1.4 feature only, correct? -- A. Steven Anderson
Re: leading and trailing wildcard query
Ah. With that restriction, it is impossible. If it is OK to pay Lucid to make a one-line change, you might be able to do it. Otherwise, get ready to spend a lot of money for a search engine. wunder On Nov 5, 2009, at 3:18 PM, A. Steven Anderson wrote: Unfortunately, we can only use official releases (not even snapshots) since it's a government-related project. -- A. Steven Anderson
RE: leading and trailing wildcard query
Hi Steve, a query such as *abc* would need the NGramFilterFactor, hence the doubleedgytext, and would be retrievable by a query such as contains:abc. Note that you can set the max and minimum size of strings that get indexed. bern -Original Message- From: A. Steven Anderson [mailto:a.steven.ander...@gmail.com] Sent: Friday, 6 November 2009 10:08 AM To: solr-user@lucene.apache.org Subject: Re: leading and trailing wildcard query Thanks for the solution, but could you elaborate on how it would find something like *abc* in a field that contains abc. Steve On Thu, Nov 5, 2009 at 5:25 PM, Bernadette Houghton < bernadette.hough...@deakin.edu.au> wrote: > I've just set up something similar (much thanks to Avesh!)- > > positionIncrementGap="100"> > > > >maxGramSize="25" /> > > > > > > > > positionIncrementGap="100"> > > > >/> > > > > > > > . > . >multiValued="true"/> >stored="false" multiValued="true"/> > . > . > > > > > > > > > > > bern
Re: leading and trailing wildcard query
> Doesn't it work to call SolrQueryParser.setAllowLeadingWildcard? Good question. Anyone? > It can be really slow, what an RDBMS person would call a full table scan. Understood. > There is an open bug to make that settable in a config file, but this is a > pretty tiny change to the source. > http://issues.apache.org/jira/browse/SOLR-218 > Unfortunately, we can only use official releases (not even snapshots) since it's a government-related project. -- A. Steven Anderson
Re: leading and trailing wildcard query
Because that is the semantics of Solr/Lucene wildcard syntax. * stands for "any number of any character". Basically, it enumerates all the terms in the field for all the documents and assembles a list of all of them that contain the substring "abc" and uses that as one of the clauses of your search... Best Erick On Thu, Nov 5, 2009 at 6:07 PM, A. Steven Anderson < a.steven.ander...@gmail.com> wrote: > Thanks for the solution, but could you elaborate on how it would find > something like *abc* in a field that contains abc. > > Steve > > On Thu, Nov 5, 2009 at 5:25 PM, Bernadette Houghton < > bernadette.hough...@deakin.edu.au> wrote: > > > I've just set up something similar (much thanks to Avesh!)- > > > > > positionIncrementGap="100"> > > > > > > > >> maxGramSize="25" /> > > > > > > > > > > > > > > > > > positionIncrementGap="100"> > > > > > > > >maxGramSize="25" > > /> > > > > > > > > > > > > > > . > > . > >> multiValued="true"/> > >> stored="false" multiValued="true"/> > > . > > . > > > > > > > > > > > > > > > > > > > > > > bern >
Re: leading and trailing wildcard query
Thanks for the solution, but could you elaborate on how it would find something like *abc* in a field that contains abc. Steve On Thu, Nov 5, 2009 at 5:25 PM, Bernadette Houghton < bernadette.hough...@deakin.edu.au> wrote: > I've just set up something similar (much thanks to Avesh!)- > > positionIncrementGap="100"> > > > >maxGramSize="25" /> > > > > > > > > positionIncrementGap="100"> > > > >/> > > > > > > > . > . >multiValued="true"/> >stored="false" multiValued="true"/> > . > . > > > > > > > > > > > bern
Re: solr query help alpha numeric and not
Hi yes its a string, in the case of a title, it can be anything, a letter a number, a symbol or a multibyte char etc. Any ideas if I wanted a query that was not a letter a-z or a number 0-9, given that its a string? thanks Joel On Nov 4, 2009, at 9:10 AM, Jonathan Hendler wrote: Hi Joel, The ID is sent back as a string (instead of as an integer) in your example. Could this be the cause? - Jonathan On Nov 4, 2009, at 9:08 AM, Joel Nylund wrote: Hi, I have a field called firstLetterTitle, this field has 1 char, it can be anything, I need help with a few queries on this char: 1.) I want all NON ALPHA and NON numbers, so any char that is not A- Z or 0-9 I tried: http://localhost:8983/solr/select?q=NOT%20firstLetterTitle:0%20TO%209%20AND%20NOT%20firstLetterTitle:A%20TO%20Z But I get back numeric results: 9 23946447 2.) I want all only Numerics: http://localhost:8983/solr/select?q=firstLetterTitle:0%20TO%209 This seems to work but just checking if its the right way. 2.) I want all only English Letters: http://localhost:8983/solr/select?q=firstLetterTitle:A%20TO%20Z This seems to work but just checking if its the right way. thanks Joel
MoreLikeThis and filtering/restricting on "target" fields
I am trying to use MoreLikeThis (both the component and handler, trying combinations) and I would like to give it an input document reference which has a "source" field to analyze and then get back other documents which have a given field that is used by MLT. My dataset is composed of documents like: # Doc 1 id:Article:99 type_s:Article body_t: the body of the article... # Doc 2 id:Article:646 types_s:Article body_t: another article... # Doc 3 id:Community:44 type_s:Community description_t: description of this community... # Doc 4 id:Community:34874 type_s:Community description_t: another description # Doc 5 id:BlogPost:2384 type_s:BlogPost body_t: contents of some blog post So I would like to say, "given an article (e.g. id:"Article:99" which has a field "body_t" that should be analyze), give more related Communities, and you will want to search on "description_t" for your analysis".' When I run a basic query like: (using raw URL values for clarity, but they are encoded in reality) http://localhost:9007/solr/mlt?q=id:WikiArticle:948&mlt.fl=body_t then I get back a ton of other articles. Which is fine if my target type was Article. So how I can I say "search on field A for your analysis of the input document, but for related terms use field B, filtered by type_s" It seems that I can really only specify one field via mlt.fl I have tried using MLT as a search component so that it has access to filter queries (via fq) but I cannot seem to get it to give me any data other than more of the same, that is, I can get a ton of Articles back but not other "content types". Am I just trying to do too much? Thanks /Cody
Re: leading and trailing wildcard query
Doesn't it work to call SolrQueryParser.setAllowLeadingWildcard? It can be really slow, what an RDBMS person would call a full table scan. There is an open bug to make that settable in a config file, but this is a pretty tiny change to the source. http://issues.apache.org/jira/browse/SOLR-218 wunder On Nov 5, 2009, at 2:13 PM, Otis Gospodnetic wrote: The guilt trick is not the best thing to try on public mailing lists. :) The first thing that popped to my mind is to use 2 fields, where the second one contains the desrever string of the first one. The second idea is to use n-grams (if it's OK to tokenize), more specifically edge n-grams. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: A. Steven Anderson To: solr-user@lucene.apache.org Sent: Thu, November 5, 2009 3:04:32 PM Subject: Re: leading and trailing wildcard query No thoughts on this? Really!? I would hate to admit to my Oracle DBE that Solr can't be customized to do a common query that a relational database can do. :-( On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson < a.steven.ander...@gmail.com> wrote: I've scoured the archives and JIRA , but the answer to my question is just not clear to me. With all the new Solr 1.4 features, is there any way to do a leading and trailing wildcard query on an *untokenized* field? e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx Yes, I know how expensive such a query would be, but we have the user requirement, nonetheless. If not, any suggestions on how to implement a custom solution using Solr? Using an external data structure? -- A. Steven Anderson
RE: leading and trailing wildcard query
I've just set up something similar (much thanks to Avesh!)- . . . . bern -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, 6 November 2009 9:13 AM To: solr-user@lucene.apache.org Subject: Re: leading and trailing wildcard query The guilt trick is not the best thing to try on public mailing lists. :) The first thing that popped to my mind is to use 2 fields, where the second one contains the desrever string of the first one. The second idea is to use n-grams (if it's OK to tokenize), more specifically edge n-grams. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: A. Steven Anderson > To: solr-user@lucene.apache.org > Sent: Thu, November 5, 2009 3:04:32 PM > Subject: Re: leading and trailing wildcard query > > No thoughts on this? Really!? > > I would hate to admit to my Oracle DBE that Solr can't be customized to do a > common query that a relational database can do. :-( > > > On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson < > a.steven.ander...@gmail.com> wrote: > > > I've scoured the archives and JIRA , but the answer to my question is just > > not clear to me. > > > > With all the new Solr 1.4 features, is there any way to do a leading and > > trailing wildcard query on an *untokenized* field? > > > > e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx > > > > Yes, I know how expensive such a query would be, but we have the user > > requirement, nonetheless. > > > > If not, any suggestions on how to implement a custom solution using Solr? > > Using an external data structure? > > > > > -- > A. Steven Anderson
Re: leading and trailing wildcard query
> > The guilt trick is not the best thing to try on public mailing lists. :) > Point taken, although not my intention. I guess I have been spoiled by quick replies and was getting to think it was a stupid question. Plus, I'm literally gonna get trash talk from my Oracle DBE if I can't make this work. ;-) We've basically relegated Oracle to handling ingest from which we index Solr and provide all search features. I'd hate to have to succumb to using Oracle to service this one special query. > The first thing that popped to my mind is to use 2 fields, where the second > one contains the desrever string of the first one. > Please elaborate. What do you mean by *desrever* string? > The second idea is to use n-grams (if it's OK to tokenize), more > specifically edge n-grams. > Well, that's the problem. The field may have non-Latin characters that may not have whitespace nor punctuation. -- A. Steven Anderson
Re: Set MMap in Solr
To use MMapDirectory, invoke Java with the System property org.apache.lucene.FSDirectory.class set to org.apache.lucene.store.MMapDirectory. This will cause FSDirectory.getDirectory(File,boolean) to return instances of this class. So, start your servlet container with -Dorg.apache.lucene.FSDirectory.class=org.apache.lucene.store.MMapDirectory Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: ba ba > To: solr-user@lucene.apache.org > Sent: Thu, November 5, 2009 2:55:42 PM > Subject: Set MMap in Solr > > Hi, > > I'm trying to set my default directory to MMap. I saw that this is done by > specifying here > > A DirectoryProvider plugin can be configured in solrconfig.xml with the > following XML: > > > > > in solrconfig.xml. > > This did not work for me when I put in the MMapDirectory class name. > > I got this information from here > http://issues.apache.org/jira/browse/SOLR-465?focusedCommentId=12715282#action_12715282 > > I'm using the latest nightly build. > > If anyone knows how to configure solr to use MMap, please let me know. I > would greatly appreciate it. > > Thanks.
Re: leading and trailing wildcard query
The guilt trick is not the best thing to try on public mailing lists. :) The first thing that popped to my mind is to use 2 fields, where the second one contains the desrever string of the first one. The second idea is to use n-grams (if it's OK to tokenize), more specifically edge n-grams. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: A. Steven Anderson > To: solr-user@lucene.apache.org > Sent: Thu, November 5, 2009 3:04:32 PM > Subject: Re: leading and trailing wildcard query > > No thoughts on this? Really!? > > I would hate to admit to my Oracle DBE that Solr can't be customized to do a > common query that a relational database can do. :-( > > > On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson < > a.steven.ander...@gmail.com> wrote: > > > I've scoured the archives and JIRA , but the answer to my question is just > > not clear to me. > > > > With all the new Solr 1.4 features, is there any way to do a leading and > > trailing wildcard query on an *untokenized* field? > > > > e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx > > > > Yes, I know how expensive such a query would be, but we have the user > > requirement, nonetheless. > > > > If not, any suggestions on how to implement a custom solution using Solr? > > Using an external data structure? > > > > > -- > A. Steven Anderson
Re: how to use ajax-solr - example?
google "applying a diff patch" http://www.linuxjournal.com/article/1237 looks like a good start. On Thu, Nov 5, 2009 at 6:39 AM, Joel Nylund wrote: > this is exactly what I was looking for, any directions how to install? I > dont really understand how to use a .patch file. > > thanks > Joel > > On Nov 4, 2009, at 9:16 PM, Lance Norskog wrote: > >> http://issues.apache.org/jira/browse/SOLR-1163 >> >> This is a really nice index browser. >> >> On Wed, Nov 4, 2009 at 12:51 PM, Joel Nylund wrote: >>> >>> Hi Israel, >>> >>> I agree the idea of adding a scripting language in between is good, but I >>> want something simple I can easily test my queries with data and scroll >>> through the results. I have been using the browser and getting xml for >>> now, >>> but would like to save my queries in a simple html page and format the >>> data. >>> >>> I figured this is something I can throw together in a few hours, but I >>> also >>> figured someone would have already done the work. >>> >>> thanks >>> Joel >>> >>> On Nov 4, 2009, at 2:02 PM, Israel Ekpo wrote: >>> On Wed, Nov 4, 2009 at 10:48 AM, Joel Nylund wrote: > Hi, I looked at the documentation and I have no idea how to get > started? > Can someone point me to or show me an example of how to send a query to > a > solr server and paginate through the results using ajax-solr. > > I would glady write a blog tutorial on how to do this if someone can > get > me > started. > > I dont know jquery but have used prototype & scriptaculous. > > thanks > Joel > > Joel, It will be best if you use a scripting language between Solr and JavaScript This is becasue sending data only between JavaScript and Solr will limit you to only one domain name. However, if you are using a scripting language between JavaScript and Solr you can use the scripting language to retrieve the request parameters from JavaScript and then same them to Solr with the response writer set to json. This will cause Solr to send the response in JSON format which the scripting language can pass on to JavaScript. This example here will cause Solr to return the response in JSON. http://example.com:8443/solr/select?q=searchkeyword&wt=json -- "Good Enough" is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. >>> >>> >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com > > -- Lance Norskog goks...@gmail.com
Re: Sending file to Solr via HTTP POST
Here is a brief example of how to use SolrJ with the ExtractingRequestHandler: ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract"); req.addFile(fileToIndex); req.setParam("literal.id", getId(fileToIndex)); req.setParam("literal.hostname", getHostname()); req.setParam("literal.filename", fileToIndex.getName()); try { getSolrServer().request(req); } catch (SolrServerException e) { e.printStackTrace(); } You'll need a request handler configured in solrconfig.xml: text true ignored_ Note that the example also shows how to use the "literal.*" parameter to add metadata fields of your choice to the document. Hope that helps get you started. -Jay http://www.lucidimagination.com On Tue, Nov 3, 2009 at 10:38 PM, Caroline Tan wrote: > Hi, > From the Solr wiki on ExtractingRequestHandler tutorial, when it comes to > the part to post file to Solr, it always uses the curl command, e.g. > curl ' > http://localhost:8983/*solr*/update/extract?literal.id=doc1&commit=true' > -F myfi...@tutorial.html > > I have never used curl and i was thinking is there any replacement to such > method? > > Is there any API that i can use to achieve the same thing in a java > project without relying on CURL? Does SolrJ have such method? Thanks > > ~caroLine >
Re: leading and trailing wildcard query
No thoughts on this? Really!? I would hate to admit to my Oracle DBE that Solr can't be customized to do a common query that a relational database can do. :-( On Wed, Nov 4, 2009 at 6:01 PM, A. Steven Anderson < a.steven.ander...@gmail.com> wrote: > I've scoured the archives and JIRA , but the answer to my question is just > not clear to me. > > With all the new Solr 1.4 features, is there any way to do a leading and > trailing wildcard query on an *untokenized* field? > > e.g. q=myfield:*abc* would return a doc with myfield=xxxabcxxx > > Yes, I know how expensive such a query would be, but we have the user > requirement, nonetheless. > > If not, any suggestions on how to implement a custom solution using Solr? > Using an external data structure? > > -- A. Steven Anderson
Set MMap in Solr
Hi, I'm trying to set my default directory to MMap. I saw that this is done by specifying here A DirectoryProvider plugin can be configured in solrconfig.xml with the following XML: in solrconfig.xml. This did not work for me when I put in the MMapDirectory class name. I got this information from here http://issues.apache.org/jira/browse/SOLR-465?focusedCommentId=12715282#action_12715282 I'm using the latest nightly build. If anyone knows how to configure solr to use MMap, please let me know. I would greatly appreciate it. Thanks.