Re: Lucene OSGi Bundle
Le 19 sept. 08 à 23:08, Gunnar Wagenknecht a écrit : Hi Lucene Developers, Issue 1344 requests to make the Lucene JAR an OSGi bundle. The approach proposed is to add the OSGi specific meta data into the MANIFEST.MF of the existing Maven artifacts. I prepared a patch which proposes a different approach. Instead of adding the headers into the Maven JARs it will create a new set of OSGi JARs. This is basically driven by the following advantages. 1. In OSGi the naming convention for a bundle JAR file is symbolic name_version.jar. The Maven JARs are not fully qualified. Therefor, it's not possible to use them out of the box, i.e. one has to download and manually rename them before they can be used. A separate set of OSGi bundles could be consumed directly without any modifications. Even if it is hardly recommended just for sanity, there is no direct relation between the name of the jar and its OSGi metadata. This hard binding only exist in an Eclipse environment. 2. Maven repositories cannot be consumed directly by OSGi frameworks. It's better to have the OSGi bundle jars in one folder which can be downloaded from mirrors and consumed by frameworks directly. actually it is possible. A maven plugin exist to manage a OSGi bundle with maven: http://felix.apache.org/site/apache-felix-maven-osgi-plugin.html And there is a maven repository which contains OSGi bundles: http://www.springsource.com/repository And in fact, that repository is mainly a repository of jar. And there is different kind of metadata around it so that different dependency management tools (maven, ivy, obr) can use it. 3. In addition to the OSGi bundle JAR I was able to generate a source jar for Eclipse PDE. Thus, whenever you are developing with Eclipse for *any* OSGi framework one would simply throw the Lucene OSGi bundle JARs together with the source bundles into the target platform. Eclipse PDE then configures the classpath automatically to attach the source code to the class files. This is very developer friendly. This could also be done in the source jars in the maven artifacts. So I think there is no hard requirement to have a complete different build for having OSGi metadata into the Lucene jars. The question here is just about the naming convention of the jars. I don't have any objection of having a third distribution layout, but it will somehow increase the work of the Lucene developers when releasing because it will add some extra sign/deploy/check work. So if Lucene developers are up to have an OSGi jar naming convention, I think that a good build would be to include the OSGi headers into the manifest to the actual jar (as does the patch I provided), and add an extra task to copy the maven jars into an OSGi layout. Then there can be a debate on to use or not to use the bnd tool (this tool used as an ant task in Gunnar's patch tries to leverage the maintenance of manifest file). I didn't used it in my patch because the classpath of Lucene is so simple (no dependency at all), that I thought it would be simpler for Lucene developers to maintain a MANIFEST.MF than a lucene.bnd. Here again, this can be question of taste for the Lucene developers. Nicolas - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1396) Improve PhraseQuery.toString()
[ https://issues.apache.org/jira/browse/LUCENE-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1396. Resolution: Fixed Fix Version/s: 2.9 Assignee: Michael McCandless Committed revision 697469 (trunk) and 697470 (2.4). Thanks Andrzej! Improve PhraseQuery.toString() -- Key: LUCENE-1396 URL: https://issues.apache.org/jira/browse/LUCENE-1396 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.4, 2.9 Reporter: Andrzej Bialecki Assignee: Michael McCandless Fix For: 2.4, 2.9 Attachments: phraseQuery.patch PhraseQuery.toString() is overly simplistic, in that it doesn't correctly show phrases with gaps or overlapping terms. This may be misleading when presenting phrase queries built using complex analyzers and filters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene OSGi Bundle
Nicolas Lalevée schrieb: the classpath of Lucene is so simple (no dependency at all), Unfortunately, it's not. Some exported packages are split across bundles. This makes it tough to manage. I wish it would be as easy as using BND to simply generate the manifests for the existing jars. But it doesn't work without the BND descriptors to get the split-packages and the version dependencies on exported/imported packages right. BTW, the Maven OSGi plug-in is deprecated and has been replaced by this one which is also based on BND. http://felix.apache.org/site/apache-felix-maven-bundle-plugin-bnd.html -Gunnar -- Gunnar Wagenknecht [EMAIL PROTECTED] http://wagenknecht.org/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Could positions/payloads in SegmentMerger be copied directly?
This part is indeed quite tricky... I'll try to take a stab at it. Paul Elschot wrote: Op Friday 19 September 2008 17:05:29 schreef Michael McCandless: Not quite, because how positions are encoded depends on whether any payload appeared in that segment. However, if 1) the input is a SegmentReader (since in general we can merge any IndexReader), and 2) its format is congruent with the format we are writing (ie both don't or do use the payloads format), which ought to be true the vast majority of the time, then I think we could simply copy bytes. Since the next TermInfo tells us the proxPointer where it begins, we know exactly how many bytes to copy. I think this'd be a nice optimization! I tried to find a way to do this, but I'm stuck at the point where the proxPointer is needed from a TermInfo. I got this far (uncompiled code, smi is the SegmentMergeInfo that is currently merged): if (smi.reader instanceof SegmentReader) { SegmentReader inputReader = smi.reader; boolean readerStorePayloads = inputReader.fieldInfos.fieldInfo(smi.term.field).storePayloads; if (storePayloads == readerStorePayloads) { // take the difference of the two prox pointers: int positionsLength = inputReader.tis. ... - ...; // do a direct byte copy from inputReader to proxOutput: ... ; } } but I could not find out how to get from the TermInfosReader at inputReader.tis to the next prox pointer. SegmentMerger never needs to index the positions by using a proxPointer itself, as it accesses all positions serially. This leaves me without an example on how to use proxPointer from a TermInfo. Any tips on how to continue? Regards, Paul Elschot Mike Paul Elschot wrote: I'm looking at the for loop in SegmentMerger.java at line 666, which completely interprets the input positions/payloads for an input term at a document. The positions/payloads don't change when they merged, is that correct? I'm wondering whether this loop could be replaced by a direct copy from the input postings to proxOutput. Regards, Paul Elschot --- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Realtime Search for Social Networks Collaboration
Agreed, it's a system that is of value to a subset of cases. On Sat, Sep 20, 2008 at 4:04 PM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: Moving back to RDBMS model will be a big step backwards where we miss mulivalued fields and arbitrary fields . On Tue, Sep 9, 2008 at 4:17 AM, Jason Rutherglen [EMAIL PROTECTED] wrote: Cool. I mention H2 because it does have some Lucene code in it yes. Also according to some benchmarks it's the fastest of the open source databases. I think it's possible to integrate realtime search for H2. I suppose there is no need to store the data in Lucene in this case? One loses the multiple values per field Lucene offers, and the schema become static. Perhaps it's a trade off? On Mon, Sep 8, 2008 at 6:17 PM, J. Delgado [EMAIL PROTECTED] wrote: Yes, both Marcelo and I would be interested. We looked into H2 and it looks like something similar to Oracle's ODCI can be implemented. Plus the primitive full-text implementación is based on Lucene. I say primitive because looking at the code I saw that one cannot define an Analyzer and for each scan corresponding to a where clause a searcher is open and closed, instead of having a pool, plus it does not have any way to queue changes to reduce the use of the IndexWriter, etc. But its open source and that is a great starting point! -- Joaquin On Mon, Sep 8, 2008 at 2:05 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: Perhaps an interesting project would be to integrate Ocean with H2 www.h2database.com to take advantage of both models. I'm not sure how exactly that would work, but it seems like it would not be too difficult. Perhaps this would solve being able to perform faster hierarchical queries and perhaps other types of queries that Lucene is not capable of. Is this something Joaquin you are interested in collaborating on? I am definitely interested in it. On Sun, Sep 7, 2008 at 4:04 AM, J. Delgado [EMAIL PROTECTED] wrote: On Sat, Sep 6, 2008 at 1:36 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Regarding real-time search and Solr, my feeling is the focus should be on first adding real-time search to Lucene, and then we'll figure out how to incorporate that into Solr later. Otis, what do you mean exactly by adding real-time search to Lucene? Note that Lucene, being a indexing/search library (and not a full blown search engine), is by definition real-time: once you add/write a document to the index it becomes immediately searchable and if a document is logically deleted and no longer returned in a search, though physical deletion happens during an index optimization. Now, the problem of adding/deleting documents in bulk, as part of a transaction and making these documents available for search immediately after the transaction is commited sounds more like a search engine problem (i.e. SOLR, Nutch, Ocean), specially if these transactions are known to be I/O expensive and thus are usually implemented bached proceeses with some kind of sync mechanism, which makes them non real-time. For example, in my previous life, I designed and help implement a quasi-realtime enterprise search engine using Lucene, having a set of multi-threaded indexers hitting a set of multiple indexes alocatted accross different search services which powered a broker based distributed search interface. The most recent documents provided to the indexers were always added to the smaller in-memory (RAM) indexes which usually could absorbe the load of a bulk add transaction and later would be merged into larger disk based indexes and then flushed to make them ready to absorbe new fresh docs. We even had further partitioning of the indexes that reflected time periods with caps on size for them to be merged into older more archive based indexes which were used less (yes the search engine default search was on data no more than 1 month old, though user could open the time window by including archives). As for SOLR and OCEAN, I would argue that these semi-structured search engines are becomming more and more like relational databases with full-text search capablities (without the benefit of full reletional algebra -- for example joins are not possible using SOLR). Notice that real-time CRUD operations and transactionality are core DB concepts adn have been studied and developed by database communities for aquite long time. There has been recent efforts on how to effeciently integrate Lucene into releational databases (see Lucene JVM ORACLE integration, see http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html) I think we should seriously look at joining efforts with open-source Database engine projects, written in Java (see http://java-source.net/open-source/database-engines) in order to blend IR and ORM for once and for all. -- Joaquin
[jira] Commented: (LUCENE-1385) IndexReader.isIndexCurrent()==false - IndexReader.reopen() - still index not current
[ https://issues.apache.org/jira/browse/LUCENE-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633083#action_12633083 ] Uwe Schindler commented on LUCENE-1385: --- By the way: The index was optimized after the change by the other process modifying the index, maybe this is the problem. IndexReader.isIndexCurrent()==false - IndexReader.reopen() - still index not current -- Key: LUCENE-1385 URL: https://issues.apache.org/jira/browse/LUCENE-1385 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.2 Environment: Linux, Solaris, Windows XP Reporter: Uwe Schindler Attachments: LUCENE-1385.patch I found a strange error occurring with IndexReader.reopen. It is not always reproduceable, it only happens sometimes, but strangely on all my computers with different platforms at the same time. Maybe has something to to with the timestamp used in index versions. I have a search server using an IndexReader, that is openend in webapp startup and should stay open. Every half an hour this web application checks, if the index is still current using IndexReader.isCurrent(). When a parallel job that indexes documents (in another virtual machine) and modifies the indexes, isCurrent() return TRUE. The half-hourly cron-job then uses IndexReader.reopen() to reopen the index. But sometimes, directly after reopen() the Index is still not current (and no updates occur). Again calling reopen does not change it, too. Searching on the index shows all new/updated documents, but isCurrent() still return false. The problem with this is, that now the index is reopened all the time, because the detection of a current index does not work any more. I have now a workaround in my code to handle this: After calling IndexReader.reopen(), I test for IndexReader.isCurrent(), and if not, I close it hard and open a new instance. Most times IndexReader.reopen works correct, but sometimes this error occurs. Looking into the code of reopen(), I realized, that there is some extra check, if the Index has modifications, and if yes the reopen call returns the original reader (this maybe the problem I have). But the IndexReader is only used for searching, no updates occur. My questions: Why is there this check for modifications in reopen()? Why does this happen only at certain times on all my servers with different platforms? I want to use reopen, because in future, when the new FieldCache will be reopen-aware and does not everytime rebuild the full cache, it will be very important, to have this fixed. At the moment, I have no problem with the case, that reopen may fail and I have to do a rough reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multi Field search without Multifieldqueryparser
Now I can have two types of queries: Structured query: name: George Bush AND Occupation: President please don't remind us! try asking this question on the [EMAIL PROTECTED] that list is for usage related questions ryan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multi Field search without Multifieldqueryparser
Hi Ryan, Apparently it's not because as far as I know Lucene doesn't support this function and I am planning to develop it. Anshul On Sun, Sep 21, 2008 at 8:04 PM, Ryan McKinley [EMAIL PROTECTED] wrote: Now I can have two types of queries: Structured query: name: George Bush AND Occupation: President please don't remind us! try asking this question on the [EMAIL PROTECTED] that list is for usage related questions ryan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Anshul Jain - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1385) IndexReader.isIndexCurrent()==false - IndexReader.reopen() - still index not current
[ https://issues.apache.org/jira/browse/LUCENE-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633089#action_12633089 ] Michael McCandless commented on LUCENE-1385: OK I think I found the bug. From those prints above I can see your current IndexReader was opened when the index had a single segment (so, it's a SegmentReader). And, the changed index also has a single segment by the same name... so we call SegmentReader.reopenSegment to do the reopening, which has logic to return itself if it detects no changes (to norms or deleetions). You are somehow hitting that logic. The bug seems to boil down to, somehow, IndexWriter is writing a new segments_N file for a single-segment index yet no actual changes were made to the segment. The bug is rather harmless: the reopen call does no real work (just returns your current IndexReader instance), and, it's doing that because there were in fact no actual changes to the index, just somehow a new segments_N file was written. I found one case where IndexWriter can do this, which is if you open the writer, call deleteDocuments but no docs actually match the Term, then close the writer. Is it possible that your indexing job that wakes up and only makes calls to deleteDocuments yet no documents matched the deleted terms? If not... can you capture the details of exactly what your indexing job did just before you hit the reopen failure? It could be another no-op action in IndexWriter that then writes a segments_N file. IndexReader.isIndexCurrent()==false - IndexReader.reopen() - still index not current -- Key: LUCENE-1385 URL: https://issues.apache.org/jira/browse/LUCENE-1385 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.2 Environment: Linux, Solaris, Windows XP Reporter: Uwe Schindler Attachments: LUCENE-1385.patch I found a strange error occurring with IndexReader.reopen. It is not always reproduceable, it only happens sometimes, but strangely on all my computers with different platforms at the same time. Maybe has something to to with the timestamp used in index versions. I have a search server using an IndexReader, that is openend in webapp startup and should stay open. Every half an hour this web application checks, if the index is still current using IndexReader.isCurrent(). When a parallel job that indexes documents (in another virtual machine) and modifies the indexes, isCurrent() return TRUE. The half-hourly cron-job then uses IndexReader.reopen() to reopen the index. But sometimes, directly after reopen() the Index is still not current (and no updates occur). Again calling reopen does not change it, too. Searching on the index shows all new/updated documents, but isCurrent() still return false. The problem with this is, that now the index is reopened all the time, because the detection of a current index does not work any more. I have now a workaround in my code to handle this: After calling IndexReader.reopen(), I test for IndexReader.isCurrent(), and if not, I close it hard and open a new instance. Most times IndexReader.reopen works correct, but sometimes this error occurs. Looking into the code of reopen(), I realized, that there is some extra check, if the Index has modifications, and if yes the reopen call returns the original reader (this maybe the problem I have). But the IndexReader is only used for searching, no updates occur. My questions: Why is there this check for modifications in reopen()? Why does this happen only at certain times on all my servers with different platforms? I want to use reopen, because in future, when the new FieldCache will be reopen-aware and does not everytime rebuild the full cache, it will be very important, to have this fixed. At the moment, I have no problem with the case, that reopen may fail and I have to do a rough reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1385) IndexReader.isIndexCurrent()==false - IndexReader.reopen() - still index not current
[ https://issues.apache.org/jira/browse/LUCENE-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633091#action_12633091 ] Michael McCandless commented on LUCENE-1385: I have a test case that shows the above failure. But, on 2.4, it does not fail -- the bug was already fixed as a byproduct of LUCENE-1194 (adding delete by query to IndexWriter). IndexReader.isIndexCurrent()==false - IndexReader.reopen() - still index not current -- Key: LUCENE-1385 URL: https://issues.apache.org/jira/browse/LUCENE-1385 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.2 Environment: Linux, Solaris, Windows XP Reporter: Uwe Schindler Attachments: LUCENE-1385.patch I found a strange error occurring with IndexReader.reopen. It is not always reproduceable, it only happens sometimes, but strangely on all my computers with different platforms at the same time. Maybe has something to to with the timestamp used in index versions. I have a search server using an IndexReader, that is openend in webapp startup and should stay open. Every half an hour this web application checks, if the index is still current using IndexReader.isCurrent(). When a parallel job that indexes documents (in another virtual machine) and modifies the indexes, isCurrent() return TRUE. The half-hourly cron-job then uses IndexReader.reopen() to reopen the index. But sometimes, directly after reopen() the Index is still not current (and no updates occur). Again calling reopen does not change it, too. Searching on the index shows all new/updated documents, but isCurrent() still return false. The problem with this is, that now the index is reopened all the time, because the detection of a current index does not work any more. I have now a workaround in my code to handle this: After calling IndexReader.reopen(), I test for IndexReader.isCurrent(), and if not, I close it hard and open a new instance. Most times IndexReader.reopen works correct, but sometimes this error occurs. Looking into the code of reopen(), I realized, that there is some extra check, if the Index has modifications, and if yes the reopen call returns the original reader (this maybe the problem I have). But the IndexReader is only used for searching, no updates occur. My questions: Why is there this check for modifications in reopen()? Why does this happen only at certain times on all my servers with different platforms? I want to use reopen, because in future, when the new FieldCache will be reopen-aware and does not everytime rebuild the full cache, it will be very important, to have this fixed. At the moment, I have no problem with the case, that reopen may fail and I have to do a rough reopen. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: 2.4 release candidate 1
OK so I wrote yet another way to do the signing, in Python (which I'll happily find any excuse to use ;) -- it prompts for your passphrase and then recurses through the dist directory looking for artifacts to sign: import sys import os import subprocess import getpass def signFile(pwd, fileName): print '\nSIGN %s' % fileName command = 'gpg --passphrase-fd 0 --batch --armor --detach-sig %s' % fileName print ' command %s' % command ascFileName = fileName + '.asc' if os.path.exists(ascFileName): os.remove(ascFileName) p = subprocess.Popen(command, shell=True, stdin=subprocess.PIPE) p.stdin.write(pwd) p.stdin.close() result = p.wait() if result != 0: raise RuntimeError('command failed: exit code %s' % result) def isArtifact(fileName): for suffix in ('.tar.gz', '.jar', '.zip', '.pom'): if fileName.endswith(suffix): return True else: return False def main(argv): if len(argv) != 2: print '\nUsage: python %s distRootDirName\n' % argv[0] return pwd = getpass.unix_getpass(prompt='\nPlease enter your GPG private key passphrase:') for dirPath, dirNames, fileNames in os.walk(argv[1]): for fileName in fileNames: if isArtifact(fileName): signFile(pwd, os.path.join(dirPath, fileName)) if __name__ == '__main__': main(sys.argv) Mike Nicolas Lalevée wrote: Le 19 sept. 08 à 15:21, Grant Ingersoll a écrit : FWIW, here's a simple bash function to do it too: function sign-artifacts() { gpg --armor --output $1-$2.pom.asc --detach-sig $1-$2.pom if [ -f $1-$2-javadoc.jar ]; then gpg --armor --output $1-$2-javadoc.jar.asc --detach-sig $1- $2-javadoc.jar fi if [ -f $1-$2-sources.jar ]; then gpg --armor --output $1-$2-sources.jar.asc --detach-sig $1- $2-sources.jar fi if [ -f $1-$2.jar ]; then gpg --armor --output $1-$2.jar.asc --detach-sig $1-$2.jar fi } I call it as sign-artifacts artifact id version number i.e. sign-artifacts solr-common 1.3.0 I suppose it could be put into a loop that recurses through sub-dirs. You might also interested into the read function which avoid enter the pass phrase for every artifact: https://svn.apache.org/repos/asf/ant/ivy/ivyde/trunk/signArtifacts.sh Nicolas -Grant On Sep 18, 2008, at 7:16 PM, Michael McCandless wrote: Yeah I was afraid of this :) I'll look at SOLR-776. Thanks for the pointer! Mike Grant Ingersoll wrote: FYI, MIke, you might be interested in https://issues.apache.org/jira/browse/SOLR-776 for signing the Maven artifacts (what a PITA). I know Michael B. has a batch script, but this does it in a Ant friendly way and is available for all RMs. Cheers, Grant On Sep 18, 2008, at 2:29 PM, Michael McCandless wrote: Hi, I just created the first release candidate for 2.4, here: http://people.apache.org/~mikemccand/staging-area/lucene2.4rc1 Please download the release candidate, kick the tires and report back on any issues you encounter. The plan is to make only serious bug fixes or build/doc fixes, to 2.4 for ~10 days, after which if there are no blockers I'll call a vote for the actual release. Happy testing, and thanks! Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1387) Add LocalLucene
[ https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12633102#action_12633102 ] Karl Wettin commented on LUCENE-1387: - bq. I'm struggling to get two of the existing tests to pass... I don't think it is from my modifications since they don't pass on the original either. On my box the test fails with different results due to the writer not beeing comitted in setUp, giving me 0 results. After adding a commit it fails with the results you are reporting here. Is it possible that you are getting one sort of result in the original due to non committed writer and another error in this version due to your changes to the distance measurement? All points in the list are rather close to each other so very small changes to the algorithm might be the problem. I have a hard time tracing the code and I'm sort of hoping this might be the problem. Add LocalLucene --- Key: LUCENE-1387 URL: https://issues.apache.org/jira/browse/LUCENE-1387 Project: Lucene - Java Issue Type: New Feature Components: contrib/* Reporter: Grant Ingersoll Priority: Minor Attachments: spatial.zip Local Lucene (Geo-search) has been donated to the Lucene project, per https://issues.apache.org/jira/browse/INCUBATOR-77. This issue is to handle the Lucene portion of integration. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Realtime Search for Social Networks Collaboration
On Sat, Sep 20, 2008 at 1:04 PM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: Moving back to RDBMS model will be a big step backwards where we miss mulivalued fields and arbitrary fields . No one is suggesting to lose any of the virtues of the field based indexing that Lucene provides. All but the contrary: by extending the RDBMS model with Lucene-based indexes one can map relational rows to documents and columns to fields. Note that one relational field can be mapped to one or more text based fields and multi-valued fields will still be allowed. Please check the Lucence OJVM implementation for details on implementation and philosophy on the RDBMS-Lucene converged model: http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg More discussions at Marcelo's blog who will be presenting in Oracle World 2008 this week. http://marceloochoa.blogspot.com/ BTW, it just happen that this was implemented using Oracle but similar implementation in H2 seems not only feasible but desirable. -- Joaquin On Tue, Sep 9, 2008 at 4:17 AM, Jason Rutherglen [EMAIL PROTECTED] wrote: Cool. I mention H2 because it does have some Lucene code in it yes. Also according to some benchmarks it's the fastest of the open source databases. I think it's possible to integrate realtime search for H2. I suppose there is no need to store the data in Lucene in this case? One loses the multiple values per field Lucene offers, and the schema become static. Perhaps it's a trade off? On Mon, Sep 8, 2008 at 6:17 PM, J. Delgado [EMAIL PROTECTED] wrote: Yes, both Marcelo and I would be interested. We looked into H2 and it looks like something similar to Oracle's ODCI can be implemented. Plus the primitive full-text implementación is based on Lucene. I say primitive because looking at the code I saw that one cannot define an Analyzer and for each scan corresponding to a where clause a searcher is open and closed, instead of having a pool, plus it does not have any way to queue changes to reduce the use of the IndexWriter, etc. But its open source and that is a great starting point! -- Joaquin On Mon, Sep 8, 2008 at 2:05 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: Perhaps an interesting project would be to integrate Ocean with H2 www.h2database.com to take advantage of both models. I'm not sure how exactly that would work, but it seems like it would not be too difficult. Perhaps this would solve being able to perform faster hierarchical queries and perhaps other types of queries that Lucene is not capable of. Is this something Joaquin you are interested in collaborating on? I am definitely interested in it. On Sun, Sep 7, 2008 at 4:04 AM, J. Delgado [EMAIL PROTECTED] wrote: On Sat, Sep 6, 2008 at 1:36 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Regarding real-time search and Solr, my feeling is the focus should be on first adding real-time search to Lucene, and then we'll figure out how to incorporate that into Solr later. Otis, what do you mean exactly by adding real-time search to Lucene? Note that Lucene, being a indexing/search library (and not a full blown search engine), is by definition real-time: once you add/write a document to the index it becomes immediately searchable and if a document is logically deleted and no longer returned in a search, though physical deletion happens during an index optimization. Now, the problem of adding/deleting documents in bulk, as part of a transaction and making these documents available for search immediately after the transaction is commited sounds more like a search engine problem (i.e. SOLR, Nutch, Ocean), specially if these transactions are known to be I/O expensive and thus are usually implemented bached proceeses with some kind of sync mechanism, which makes them non real-time. For example, in my previous life, I designed and help implement a quasi-realtime enterprise search engine using Lucene, having a set of multi-threaded indexers hitting a set of multiple indexes alocatted accross different search services which powered a broker based distributed search interface. The most recent documents provided to the indexers were always added to the smaller in-memory (RAM) indexes which usually could absorbe the load of a bulk add transaction and later would be merged into larger disk based indexes and then flushed to make them ready to absorbe new fresh docs. We even had further partitioning of the indexes that reflected time periods with caps on size for them to be merged into older more archive based indexes which were used less (yes the search engine default search was on data no more than 1 month old, though user could open the time window by including archives). As for SOLR and OCEAN, I would argue that these semi-structured
Re: Realtime Search for Social Networks Collaboration
Sorry, I meant loose (replacing lose) On Sun, Sep 21, 2008 at 8:38 PM, J. Delgado [EMAIL PROTECTED]wrote: On Sat, Sep 20, 2008 at 1:04 PM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: Moving back to RDBMS model will be a big step backwards where we miss mulivalued fields and arbitrary fields . No one is suggesting to lose any of the virtues of the field based indexing that Lucene provides. All but the contrary: by extending the RDBMS model with Lucene-based indexes one can map relational rows to documents and columns to fields. Note that one relational field can be mapped to one or more text based fields and multi-valued fields will still be allowed. Please check the Lucence OJVM implementation for details on implementation and philosophy on the RDBMS-Lucene converged model: http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg More discussions at Marcelo's blog who will be presenting in Oracle World 2008 this week. http://marceloochoa.blogspot.com/ BTW, it just happen that this was implemented using Oracle but similar implementation in H2 seems not only feasible but desirable. -- Joaquin On Tue, Sep 9, 2008 at 4:17 AM, Jason Rutherglen [EMAIL PROTECTED] wrote: Cool. I mention H2 because it does have some Lucene code in it yes. Also according to some benchmarks it's the fastest of the open source databases. I think it's possible to integrate realtime search for H2. I suppose there is no need to store the data in Lucene in this case? One loses the multiple values per field Lucene offers, and the schema become static. Perhaps it's a trade off? On Mon, Sep 8, 2008 at 6:17 PM, J. Delgado [EMAIL PROTECTED] wrote: Yes, both Marcelo and I would be interested. We looked into H2 and it looks like something similar to Oracle's ODCI can be implemented. Plus the primitive full-text implementación is based on Lucene. I say primitive because looking at the code I saw that one cannot define an Analyzer and for each scan corresponding to a where clause a searcher is open and closed, instead of having a pool, plus it does not have any way to queue changes to reduce the use of the IndexWriter, etc. But its open source and that is a great starting point! -- Joaquin On Mon, Sep 8, 2008 at 2:05 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: Perhaps an interesting project would be to integrate Ocean with H2 www.h2database.com to take advantage of both models. I'm not sure how exactly that would work, but it seems like it would not be too difficult. Perhaps this would solve being able to perform faster hierarchical queries and perhaps other types of queries that Lucene is not capable of. Is this something Joaquin you are interested in collaborating on? I am definitely interested in it. On Sun, Sep 7, 2008 at 4:04 AM, J. Delgado [EMAIL PROTECTED] wrote: On Sat, Sep 6, 2008 at 1:36 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Regarding real-time search and Solr, my feeling is the focus should be on first adding real-time search to Lucene, and then we'll figure out how to incorporate that into Solr later. Otis, what do you mean exactly by adding real-time search to Lucene? Note that Lucene, being a indexing/search library (and not a full blown search engine), is by definition real-time: once you add/write a document to the index it becomes immediately searchable and if a document is logically deleted and no longer returned in a search, though physical deletion happens during an index optimization. Now, the problem of adding/deleting documents in bulk, as part of a transaction and making these documents available for search immediately after the transaction is commited sounds more like a search engine problem (i.e. SOLR, Nutch, Ocean), specially if these transactions are known to be I/O expensive and thus are usually implemented bached proceeses with some kind of sync mechanism, which makes them non real-time. For example, in my previous life, I designed and help implement a quasi-realtime enterprise search engine using Lucene, having a set of multi-threaded indexers hitting a set of multiple indexes alocatted accross different search services which powered a broker based distributed search interface. The most recent documents provided to the indexers were always added to the smaller in-memory (RAM) indexes which usually could absorbe the load of a bulk add transaction and later would be merged into larger disk based indexes and then flushed to make them ready to absorbe new fresh docs. We even had further partitioning of the indexes that reflected time periods with caps on size for them to be merged into older more archive based indexes which were used less (yes the search engine default search was on data no more than 1 month old, though user