RE: probable sore topic... Maven build?
Hi Paul, As Grant mentioned, there is a full complement of Maven POMs stored under the top-level dev-tools/ directory. You can populate the source tree with POMs by running 'ant get-maven-poms' from the top level. Before the Maven build will function fully, run 'mvn -N -P bootstrap install' at the top level to install the non-Mavenized dependencies in your local repository. At this point the Maven build should work. Please report any problems you find. Maven artifacts are built nightly via 'ant generate-maven-artifacts' by Jenkins. The Maven build is also exercised nightly via 'mvn install' from the top level. See this wiki page for pointers to the nightly Lucene/Solr Maven builds, the scripts that run them, and the published maven artifacts: http://wiki.apache.org/solr/NightlyBuilds Feel free to ask more questions. Steve > -Original Message- > From: Paul R. Brown [mailto:p...@mult.ifario.us] > Sent: Thursday, April 21, 2011 4:38 PM > To: dev@lucene.apache.org > Subject: probable sore topic... Maven build? > > > Hi, Dev@ -- > > I'm sure that there have been plenty of discussions previously about the > current build and Maven artifacts; I am not looking to restart those > discussions. That said, I am going to make a Maven-centric build of the > Lucene/Solr universe, mostly because I need to build Maven artifacts from > trunk on a regular basis, and I'm both happy to share with other folks if > they're interested or build on existing efforts if someone cares to point > me in the right direction. > > Thanks in advance. > > -- Paul > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3040) analysis consumers should use reusable tokenstreams
[ https://issues.apache.org/jira/browse/LUCENE-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3040: Attachment: LUCENE-3040.patch > analysis consumers should use reusable tokenstreams > --- > > Key: LUCENE-3040 > URL: https://issues.apache.org/jira/browse/LUCENE-3040 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Robert Muir > Fix For: 3.2, 4.0 > > Attachments: LUCENE-3040.patch > > > Some analysis consumers (highlighter, more like this, memory index, contrib > queryparser, ...) are using Analyzer.tokenStream but should be using > Analyzer.reusableTokenStream instead for better performance. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3040) analysis consumers should use reusable tokenstreams
analysis consumers should use reusable tokenstreams --- Key: LUCENE-3040 URL: https://issues.apache.org/jira/browse/LUCENE-3040 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 3.2, 4.0 Attachments: LUCENE-3040.patch Some analysis consumers (highlighter, more like this, memory index, contrib queryparser, ...) are using Analyzer.tokenStream but should be using Analyzer.reusableTokenStream instead for better performance. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2471) Localparams not working with 2 fq parameters using qt=name
[ https://issues.apache.org/jira/browse/SOLR-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023108#comment-13023108 ] Bill Bell commented on SOLR-2471: - This is not a question. It is a bug. Here is the problem: I want to do two FQs (filtered queries) to reduce the results. Each FQ is a subquery. For example: fq={!type=dismax qt=dismaxname v="bill"} fq={!type=dismax qt=anotherfield v="doctor"} What happens is the QT (solrconfig) dismaxname is used for both FQs. We need it to support 2 different QT names. Is this clear? If not, I will send an example that show breakage. > Localparams not working with 2 fq parameters using qt=name > -- > > Key: SOLR-2471 > URL: https://issues.apache.org/jira/browse/SOLR-2471 > Project: Solr > Issue Type: Bug >Reporter: Bill Bell > > We are having a problem with the following query. If we have two localparams > (using fq) and use QT= it does not work. > This does not find any results: > http://localhost:8983/solr/provs/select?qname=john&qspec=dent&fq={!type=dismax > qt=namespec v=$qspec}&fq={!type=dismax qt=dismaxname > v=$qname}&q=_val_:"{!type=dismax qt=namespec v=$qspec}" _val_:"{!type=dismax > qt=dismaxname > v=$qname}"&fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_name&wt=csv&facet=true&facet.field=specialties_desc&sort=score > desc&rows=1000&start=0 > This works okay. It returns a few results. > http://localhost:8983/solr/provs/select?qname=john&qspec=dent&fq={!type=dismax > qf=$qqf v=$qspec}&fq={!type=dismax qt=dismaxname > v=$qname}&q=_val_:"{!type=dismax qf=$qqf v=$qspec}" _val_:"{!type=dismax > qt=dismaxname v=$qname}" &qqf=specialties_ngram^1.0 > specialties_search^2.0&fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_name&wt=csv&facet=true&facet.field=specialties_desc&sort=score > desc&rows=1000&start=0 > We would like to use a QT for both terms but it seems there is some kind of > bug when using two localparams and dismax filters with QT. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2471) Localparams not working with 2 fq parameters using qt=name
[ https://issues.apache.org/jira/browse/SOLR-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023110#comment-13023110 ] Bill Bell commented on SOLR-2471: - Hoss, Since they are LocalParams, why can't we have TWO QTs? One for each subquery? Based on what you are saying, this would be a new feature request... > Localparams not working with 2 fq parameters using qt=name > -- > > Key: SOLR-2471 > URL: https://issues.apache.org/jira/browse/SOLR-2471 > Project: Solr > Issue Type: Bug >Reporter: Bill Bell > > We are having a problem with the following query. If we have two localparams > (using fq) and use QT= it does not work. > This does not find any results: > http://localhost:8983/solr/provs/select?qname=john&qspec=dent&fq={!type=dismax > qt=namespec v=$qspec}&fq={!type=dismax qt=dismaxname > v=$qname}&q=_val_:"{!type=dismax qt=namespec v=$qspec}" _val_:"{!type=dismax > qt=dismaxname > v=$qname}"&fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_name&wt=csv&facet=true&facet.field=specialties_desc&sort=score > desc&rows=1000&start=0 > This works okay. It returns a few results. > http://localhost:8983/solr/provs/select?qname=john&qspec=dent&fq={!type=dismax > qf=$qqf v=$qspec}&fq={!type=dismax qt=dismaxname > v=$qname}&q=_val_:"{!type=dismax qf=$qqf v=$qspec}" _val_:"{!type=dismax > qt=dismaxname v=$qname}" &qqf=specialties_ngram^1.0 > specialties_search^2.0&fl=specialties_desc,score,hgid,specialties_search,specialties_ngram,first_middle_last_name&wt=csv&facet=true&facet.field=specialties_desc&sort=score > desc&rows=1000&start=0 > We would like to use a QT for both terms but it seems there is some kind of > bug when using two localparams and dismax filters with QT. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7329 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7329/ 1 tests failed. REGRESSION: org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2894) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589) at java.lang.StringBuffer.append(StringBuffer.java:337) at java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617) at org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93) at org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304) at org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1082) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1010) Build Log (for compile errors): [...truncated 5276 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-trunk - Build # 1537 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-trunk/1537/ 21 tests failed. REGRESSION: org.apache.lucene.index.TestNRTThreads.testNRTThreads Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-trunk/checkout/lucene/build/test/1/nrtopenfiles.4311211294863747903/_bx.tvd (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-trunk/checkout/lucene/build/test/1/nrtopenfiles.4311211294863747903/_bx.tvd (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:233) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput$Descriptor.(SimpleFSDirectory.java:69) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.(SimpleFSDirectory.java:90) at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.(NIOFSDirectory.java:91) at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:78) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:345) at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:374) at org.apache.lucene.store.Directory.openInput(Directory.java:122) at org.apache.lucene.index.TermVectorsReader.(TermVectorsReader.java:83) at org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:236) at org.apache.lucene.index.SegmentReader.openDocStores(SegmentReader.java:515) at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:611) at org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:560) at org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:172) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:360) at org.apache.lucene.index.DirectoryReader.doReopenFromWriter(DirectoryReader.java:419) at org.apache.lucene.index.DirectoryReader.doReopen(DirectoryReader.java:432) at org.apache.lucene.index.DirectoryReader.reopen(DirectoryReader.java:392) at org.apache.lucene.index.TestNRTThreads.testNRTThreads(TestNRTThreads.java:213) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1160) REGRESSION: org.apache.lucene.index.TestOmitNorms.testOmitNormsCombos Error Message: /usr/home/hudson/hudson-slave/workspace/Lucene-trunk/checkout/lucene/build/test/1/test8730544420518378026tmp/_i_0.skp (Too many open files in system) Stack Trace: java.io.FileNotFoundException: /usr/home/hudson/hudson-slave/workspace/Lucene-trunk/checkout/lucene/build/test/1/test8730544420518378026tmp/_i_0.skp (Too many open files in system) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:233) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:448) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:312) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:348) at org.apache.lucene.index.codecs.sep.SepPostingsWriterImpl.(SepPostingsWriterImpl.java:139) at org.apache.lucene.index.codecs.sep.SepPostingsWriterImpl.(SepPostingsWriterImpl.java:106) at org.apache.lucene.index.codecs.mockintblock.MockFixedIntBlockCodec.fieldsConsumer(MockFixedIntBlockCodec.java:114) at org.apache.lucene.index.PerFieldCodecWrapper$FieldsWriter.(PerFieldCodecWrapper.java:64) at org.apache.lucene.index.PerFieldCodecWrapper.fieldsConsumer(PerFieldCodecWrapper.java:54) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:78) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:103) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:65) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:55) at org.apache.lucene.index.DocumentsWriter.flush(DocumentsWriter.java:567) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2497) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2462) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1180) at org.apache.lucene.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:101) at org.apache.lucene.index.TestOmitNorms.getNorms(TestOmitNorms.java:285) at org.apache.lucene.index.TestOmitNorms.testOmitNormsCombos(TestOmitNorms.java:244) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1232) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunn
Re: Stand-alone Index updating using EmbeddedSolrServer
On Thu, Apr 21, 2011 at 8:15 PM, Kiko Aumond wrote: > Yes, this is a CSV Loader. This looks like one of those cases where there > are many ways to handle 90% of the requirements but none that solves 100% of > the problem. Which is why the CSV loader also almost solves the problem, but > not quite. > > We're not using solr as a web app, just using the embedded server, which is > why we can't use curl and hence CSVLoader. So this is a purely command-line > driven application that runs against an embedded Solr server, no web > containers, for performance reasons. But I've already pointed out that if you were running the solr server, you could easily have just streamed the CSV directly from disk (even though the time savings are normally in the 1-2% range). Regardless, even if you're using embedded, you should still be able to pass "stream.url=file://my_local_file" via something like DirectSolrConnection or EmbeddedSolrServer and have the standard CSVLoader stream directly from the file. Of course if the CSV files are of any sufficient size, it's not going to matter if you kick off the stream via HTTP or embedded. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Stand-alone Index updating using EmbeddedSolrServer
Yes, this is a CSV Loader. This looks like one of those cases where there are many ways to handle 90% of the requirements but none that solves 100% of the problem. Which is why the CSV loader also almost solves the problem, but not quite. We're not using solr as a web app, just using the embedded server, which is why we can't use curl and hence CSVLoader. So this is a purely command-line driven application that runs against an embedded Solr server, no web containers, for performance reasons. On Thu, Apr 21, 2011 at 4:47 PM, Yonik Seeley wrote: > On Thu, Apr 21, 2011 at 7:27 PM, Kiko Aumond wrote: > > Yes, I've seen that page, but I went a bit beyond the material there, as > the > > code I wrote is able to set parameters such as separators, encapsulators > and > > the index columns, whether to split parameters, auto-commit as well as > the > > ability to do incremental or full index reloads. > > Is this a CSV loader? > If so, did you know the CSV loader (and other data loaders) have the > option to bypass HTTP also and stream directly from a local file (or > other URL)? > > > Also, from what I've seen in DirectSolrConnection (version 1.4.1), you > have > > to supply the document body as a String. We want to avoid havindgto load > > the entire document into memory, which is why we load the files into > > ContentStream objects and pass them to the embedded Solr server (I am > > assuming ContentStream actually streams the file as its name suggests > > instead of trying to load it into memory). The utility I wrote gets a > path, > > a Regex expression for all the files to be loaded, as well as the > parameters > > mentioned above and it does either a full or incremental upload of > multiple > > files with a single command. > > > > We run a very high load application with SOLR in the back end that > requires > > that we use the Embedded solr server to eliminate the network round-trip. > > Even a small incremental gain in performance is important for us. > > Eliminating the network round-trip is certainly important for good > bulk indexing performance. Luckily you don't have to > embed to do that. You can use multiple threads (say 16 for a 4 core > server) that essentially covers up > any round-trip latency (use persistent connections though! or use > SolrJ which does by default), > or you can use the StreamingUpdateSolrServer that eliminates > round-trip network delays > by streaming documents over multiple already open connections. > > -Yonik > http://www.lucenerevolution.org -- Lucene/Solr User Conference, May > 25-26, San Francisco >
Re: Stand-alone Index updating using EmbeddedSolrServer
On Thu, Apr 21, 2011 at 7:27 PM, Kiko Aumond wrote: > Yes, I've seen that page, but I went a bit beyond the material there, as the > code I wrote is able to set parameters such as separators, encapsulators and > the index columns, whether to split parameters, auto-commit as well as the > ability to do incremental or full index reloads. Is this a CSV loader? If so, did you know the CSV loader (and other data loaders) have the option to bypass HTTP also and stream directly from a local file (or other URL)? > Also, from what I've seen in DirectSolrConnection (version 1.4.1), you have > to supply the document body as a String. We want to avoid havindgto load > the entire document into memory, which is why we load the files into > ContentStream objects and pass them to the embedded Solr server (I am > assuming ContentStream actually streams the file as its name suggests > instead of trying to load it into memory). The utility I wrote gets a path, > a Regex expression for all the files to be loaded, as well as the parameters > mentioned above and it does either a full or incremental upload of multiple > files with a single command. > > We run a very high load application with SOLR in the back end that requires > that we use the Embedded solr server to eliminate the network round-trip. > Even a small incremental gain in performance is important for us. Eliminating the network round-trip is certainly important for good bulk indexing performance. Luckily you don't have to embed to do that. You can use multiple threads (say 16 for a 4 core server) that essentially covers up any round-trip latency (use persistent connections though! or use SolrJ which does by default), or you can use the StreamingUpdateSolrServer that eliminates round-trip network delays by streaming documents over multiple already open connections. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2268) Add support for Point in Polygon searches
[ https://issues.apache.org/jira/browse/SOLR-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022980#comment-13022980 ] Alexander Kanarsky commented on SOLR-2268: -- Robert, in addition to David's work (SOLR-2155) you can also try my polygon extension for the JTeam's (Chris Male et al) Spatial Solr Plugin (SSP). The SSP is a standalone Solr module that works with Solr 1.4.x, you can get it here: http://www.jteam.nl/products/spatialsolrplugin.html It allows Radius search for the lat,lon documents. The polygon/polyline extension for SSP 2.0 in addition to this allows polygon search. It is located here: http://sourceforge.net/projects/ssplex/files/ There has some limitations (for example, plane geometry is used) but it may work just well for you, depending on your situation. > Add support for Point in Polygon searches > - > > Key: SOLR-2268 > URL: https://issues.apache.org/jira/browse/SOLR-2268 > Project: Solr > Issue Type: New Feature >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll > Attachments: SOLR-2268.patch > > > In spatial applications, it is common to ask whether a point is inside of a > polygon. Solr could support two forms of this: > # A field contains a polygon and the user supplies a point. If it does, the > doc is returned. > # A document contains a point and the user supplies a polygon. If the point > is in the polygon, return the document > With both of these case, it would be good to support the negative assertion, > too. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Stand-alone Index updating using EmbeddedSolrServer
Thank you, Yonik. Yes, I've seen that page, but I went a bit beyond the material there, as the code I wrote is able to set parameters such as separators, encapsulators and the index columns, whether to split parameters, auto-commit as well as the ability to do incremental or full index reloads. Also, from what I've seen in DirectSolrConnection (version 1.4.1), you have to supply the document body as a String. We want to avoid havindgto load the entire document into memory, which is why we load the files into ContentStream objects and pass them to the embedded Solr server (I am assuming ContentStream actually streams the file as its name suggests instead of trying to load it into memory). The utility I wrote gets a path, a Regex expression for all the files to be loaded, as well as the parameters mentioned above and it does either a full or incremental upload of multiple files with a single command. We run a very high load application with SOLR in the back end that requires that we use the Embedded solr server to eliminate the network round-trip. Even a small incremental gain in performance is important for us. On Thu, Apr 21, 2011 at 4:02 PM, Yonik Seeley wrote: > On Thu, Apr 21, 2011 at 6:26 PM, Kiko Aumond wrote: > > Hi > > > > I am new to the list and relatively new to SOLR. I am working on a tool > for > > updating indexes directly through EmbeddedSolrServer thus eliminating the > > need for sending potentially large documents over HTTP. > > http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer > And also DirectSolrConnection > > It's generally discouraged as a premature optimization that normally > gains you only a few percent increase in performance. > > > -Yonik > http://www.lucenerevolution.org -- Lucene/Solr User Conference, May > 25-26, San Francisco >
Re: Stand-alone Index updating using EmbeddedSolrServer
On Thu, Apr 21, 2011 at 6:26 PM, Kiko Aumond wrote: > Hi > > I am new to the list and relatively new to SOLR. I am working on a tool for > updating indexes directly through EmbeddedSolrServer thus eliminating the > need for sending potentially large documents over HTTP. http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer And also DirectSolrConnection It's generally discouraged as a premature optimization that normally gains you only a few percent increase in performance. -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Stand-alone Index updating using EmbeddedSolrServer
Hi I am new to the list and relatively new to SOLR. I am working on a tool for updating indexes directly through EmbeddedSolrServer thus eliminating the need for sending potentially large documents over HTTP. This tool could also be easily modified to allow for command-line querying, also using the embedded server. My questions are : 1) Does this functionality already exist and my google searches didn't find it? 2) If this functionality is not yet available for embedded servers, would there be interest in this code as a contribution to the Solr project and how should I go about submitting the code? Thanks Kiko
Re: probable sore topic... Maven build?
Have a look at the dev-tools area under the trunk checkout. From there, you can use many maven tools if you would like. On Apr 21, 2011, at 10:38 PM, Paul R. Brown wrote: > > Hi, Dev@ -- > > I'm sure that there have been plenty of discussions previously about the > current build and Maven artifacts; I am not looking to restart those > discussions. That said, I am going to make a Maven-centric build of the > Lucene/Solr universe, mostly because I need to build Maven artifacts from > trunk on a regular basis, and I'm both happy to share with other folks if > they're interested or build on existing efforts if someone cares to point me > in the right direction. > > Thanks in advance. > > -- Paul > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
probable sore topic... Maven build?
Hi, Dev@ -- I'm sure that there have been plenty of discussions previously about the current build and Maven artifacts; I am not looking to restart those discussions. That said, I am going to make a Maven-centric build of the Lucene/Solr universe, mostly because I need to build Maven artifacts from trunk on a regular basis, and I'm both happy to share with other folks if they're interested or build on existing efforts if someone cares to point me in the right direction. Thanks in advance. -- Paul - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] Joining the Revolution
Comerade? Seems like there is a Lucene conference: http://www.lucenerevolution.com/ ; Anyone planning on attending?
Re: Indexing data with Trade Mark Symbol
I've had no issues with indexing string with non-ASCII characters, but your web container may be interfering with what you're trying to do; see: http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config -- Paul On Thursday, April 7, 2011 at 1:02 AM, mechravi25 wrote: Hi, > Has anyone indexed the data with Trade Mark symbol??...when i tried to > index, the data appears as below... I want to see the Indexed data with TM > symbol > > Indexed Data: > 79797 - Siebel Research AI Fund, > 79797 - Siebel Research AI Fund,l > > > Original Data: > 79797 - Siebel Research™ AI Fund, > > > Please help me to resolve this > > Regards, > Ravi > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-data-with-Trade-Mark-Symbol-tp2789391p2789391.html > Sent from the Solr - Dev mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2272) Join
[ https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022805#comment-13022805 ] Gerd Bremer commented on SOLR-2272: --- Is it possible to sort the join query result? // first class of documents with refid and pagecount fields; // a refid field maps to an id field in the second class of documents (1->100, 2->101) doc1: id:1 refid:100 pagecount:35 doc2: - id:2 refid:101 pagecount:45 // second class of documents with text field doc100: -- id:100 text:hello world doc101: -- id:101 text: goodbye Now I would like to select the documents from the first class with field pagecount sorted descandant, that is {doc2, doc1} and return the mapped documents with text in the same order that is {doc101,doc100}. Is this possible with join? I'm looking for an alternative to partial update and this join looks promising if I can sort and get the mapped result in the same order. > Join > > > Key: SOLR-2272 > URL: https://issues.apache.org/jira/browse/SOLR-2272 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Yonik Seeley > Fix For: 4.0 > > Attachments: SOLR-2272.patch, SOLR-2272.patch > > > Limited join functionality for Solr, mapping one set of IDs matching a query > to another set of IDs, based on the indexed tokens of the fields. > Example: > fq={!join from=parent_ptr to:parent_id}child_doc:query -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-3018: -- Attachment: LUCENE-3018.patch Changed the library folder to: {noformat} ant_lib {noformat} Updated the overview.html file too. > Lucene Native Directory implementation need automated build > --- > > Key: LUCENE-3018 > URL: https://issues.apache.org/jira/browse/LUCENE-3018 > Project: Lucene - Java > Issue Type: Wish > Components: Build >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Varun Thacker >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, > LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, > cpptasks-1.0b5.jar, cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar > > > Currently the native directory impl in contrib/misc require manual action to > compile the c code (partially) documented in > > https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html > yet it would be nice if we had an ant task and documentation for all > platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3018) Lucene Native Directory implementation need automated build
[ https://issues.apache.org/jira/browse/LUCENE-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022724#comment-13022724 ] Simon Willnauer commented on LUCENE-3018: - bq. In my opinion, the folder name of this utility lib should maybe named different, because lib/ is used for runtime libs and not compile-time dependencies. Any comments on this? Simon? I agree, maybe we should call it ant_lib ? {quote}We require all libs in Lucene to be named with version number. Of course the lib should contain it in its manifest, but the name should also contain it. Cpptasks is a utility lib, so its not so important for this case, but for consistency, all other libs that will be copied by the user to his classpath when using Lucene this is important to make sure, that the user can manage the dependencies if he has conflicting version numbers. {quote} Varun, can you change it please? > Lucene Native Directory implementation need automated build > --- > > Key: LUCENE-3018 > URL: https://issues.apache.org/jira/browse/LUCENE-3018 > Project: Lucene - Java > Issue Type: Wish > Components: Build >Affects Versions: 4.0 >Reporter: Simon Willnauer >Assignee: Varun Thacker >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, > LUCENE-3018.patch, LUCENE-3018.patch, LUCENE-3018.patch, cpptasks-1.0b5.jar, > cpptasks-LICENSE-ASL.txt, cpptasks.jar, cpptasks.jar > > > Currently the native directory impl in contrib/misc require manual action to > compile the c code (partially) documented in > > https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/contrib/misc/src/java/overview.html > yet it would be nice if we had an ant task and documentation for all > platforms how to compile them and set up the prerequisites. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1709) Distributed Date and Range Faceting
[ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022697#comment-13022697 ] Peter Sturge commented on SOLR-1709: Yes, the deprecation story makes sense. Regarding SOLR-1729, I'm pretty sure this already works for 3x (it was originally created on/for the 3x branch). I guess Yonik's NOW changes were destined for trunk, but I've been using the current SOLR-1729 patch on 3x branch and is working fine in production environments. Thanks Peter > Distributed Date and Range Faceting > --- > > Key: SOLR-1709 > URL: https://issues.apache.org/jira/browse/SOLR-1709 > Project: Solr > Issue Type: Improvement > Components: SearchComponents - other >Affects Versions: 1.4 >Reporter: Peter Sturge >Assignee: Hoss Man >Priority: Minor > Fix For: 4.0 > > Attachments: FacetComponent.java, FacetComponent.java, > ResponseBuilder.java, SOLR-1709.patch, > SOLR-1709_distributed_date_faceting_v3x.patch, solr-1.4.0-solr-1709.patch > > > This patch is for adding support for date facets when using distributed > searches. > Date faceting across multiple machines exposes some time-based issues that > anyone interested in this behaviour should be aware of: > Any time and/or time-zone differences are not accounted for in the patch > (i.e. merged date facets are at a time-of-day, not necessarily at a universal > 'instant-in-time', unless all shards are time-synced to the exact same time). > The implementation uses the first encountered shard's facet_dates as the > basis for subsequent shards' data to be merged in. > This means that if subsequent shards' facet_dates are skewed in relation to > the first by >1 'gap', these 'earlier' or 'later' facets will not be merged > in. > There are several reasons for this: > * Performance: It's faster to check facet_date lists against a single map's > data, rather than against each other, particularly if there are many shards > * If 'earlier' and/or 'later' facet_dates are added in, this will make the > time range larger than that which was requested > (e.g. a request for one hour's worth of facets could bring back 2, 3 > or more hours of data) > This could be dealt with if timezone and skew information was added, and > the dates were normalized. > One possibility for adding such support is to [optionally] add 'timezone' and > 'now' parameters to the 'facet_dates' map. This would tell requesters what > time and TZ the remote server thinks it is, and so multiple shards' time data > can be normalized. > The patch affects 2 files in the Solr core: > org.apache.solr.handler.component.FacetComponent.java > org.apache.solr.handler.component.ResponseBuilder.java > The main changes are in FacetComponent - ResponseBuilder is just to hold the > completed SimpleOrderedMap until the finishStage. > One possible enhancement is to perhaps make this an optional parameter, but > really, if facet.date parameters are specified, it is assumed they are > desired. > Comments & suggestions welcome. > As a favour to ask, if anyone could take my 2 source files and create a PATCH > file from it, it would be greatly appreciated, as I'm having a bit of trouble > with svn (don't shoot me, but my environment is a Redmond-based os company). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene Jenkins slave out of disk
On Thu, Apr 21, 2011 at 8:42 AM, Uwe Schindler wrote: > [root@lucene /home/hudson/hudson-slave/workspace]# df -h > Filesystem Size Used Avail Capacity > Mounted on > zroot/jails/lucene.zones.apache.org 69G 53G 17G 76% / > > I think the master is this time out of disk space. The disk space was cleared (or at least that's what Jenkins thinks). The slave is now running again. /niklas - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene Jenkins slave out of disk
Hi The box running the Jenkins slave for Lucene is out of disk space again. Please check. /niklas - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Lucene Jenkins slave out of disk
OK, I was not aware that you took action, too. Because there is the message about Jenkins shutdown, I was assuming that you are fixing the master. About the Lucene slave, the problem we have is: Our tests need lots of disk space for large indexes. Before the build, the workspace is complete cleared (at least for the nightly builds) and for the half-hourly builds it is svn-upped and then cleaned by ANT (ant clean). But at the end of the build the build script does not call "ant clean" again, because otherwise we would lost the test results. I am working on adding an additional task to our ant build scripts that will only clean up the test data that can be called after the build finished. This may clean up unneeded disk space after build so later running builds can reuse the space. I will talk to the other committers who know better where the temporary test files lie. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Niklas Gustavsson [mailto:nik...@protocol7.com] > Sent: Thursday, April 21, 2011 8:57 AM > To: bui...@apache.org > Cc: dev@lucene.apache.org > Subject: Re: Lucene Jenkins slave out of disk > > On Thu, Apr 21, 2011 at 8:42 AM, Uwe Schindler wrote: > > [root@lucene /home/hudson/hudson-slave/workspace]# df -h > Filesystem > > Size Used Avail Capacity Mounted on > > zroot/jails/lucene.zones.apache.org 69G 53G 17G 76% > > / > > > > I think the master is this time out of disk space. > > The disk space was cleared (or at least that's what Jenkins thinks). > The slave is now running again. > > /niklas - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org