Error in Integrating JBoss 4.2 and Solr-1.3.0:
I am trying to integrate JBOSS and Solr(multicore) To get started, I am trying to deploy a single instance of solr. 1) I have Edited C:/jboss/jboss-4.2.1.GA/server/default/conf/jboss-service.xml and entered the following details: http://www.w3.org/2001/XMLSchema-instance"; xmlns:jndi="urn:jboss:jndi-binding-service:1.0" xs:schemaLocation="urn:jboss:jndi-binding-service:1.0resource:jndi-binding-service_1_0.xsd" > C:\apache-solr-1.3.0\example\solr 2) Copied the war file from the solr distribution, converted it to solr.zip, unzipped and made the following changes and again bundled to zip and then to war, and pasted to the default/deploy folder of JBOSS. a) Edited the web.xml and added the following just before the tag : solr/home java.lang.String b) Created a jboss-web.xml file inside the WEB-INF folder The file contains : solr solr/home /solr/home But when I start the server I am getting the following error. 18:25:25,229 ERROR [URLDeploymentScanner] Incomplete Deployment listing: --- Packages waiting for a deployer --- [EMAIL PROTECTED] { url=file:/C:/jboss/jboss-4.2.1.GA/server/default/deploy/solr.war } deployer: null status: null state: INIT_WAITING_DEPLOYER watch: file:/C:/jboss/jboss-4.2.1.GA/server/default/deploy/solr.war altDD: null lastDeployed: 1224766525227 lastModified: 1224766525226 mbeans: --- Incompletely deployed packages --- [EMAIL PROTECTED] { url=file:/C:/jboss/jboss-4.2.1.GA/server/default/deploy/solr.war } deployer: null status: null state: INIT_WAITING_DEPLOYER watch: file:/C:/jboss/jboss-4.2.1.GA/server/default/deploy/solr.war altDD: null lastDeployed: 1224766525227 lastModified: 1224766525226 mbeans: Is there anything wrong in the steps i followed or whether i missed some steps? Is there any other good docs in this subject. Also where can I specify the index directory? Any suggestion/advice is really appreciable. Thanks con -- View this message in context: http://www.nabble.com/Error-in-Integrating-JBoss-4.2-and-Solr-1.3.0%3A-tp20202032p20202032.html Sent from the Solr - User mailing list archive at Nabble.com.
Question about textTight
Hi, So I've been using the textTight field to hold filenames, and I've run into a weird problem. Basically, people want to search by part of a filename (say, the filename is stm0810m_ws_001ftws and they want to find everything starting with stm0810m_ (stm0810m_*). I'm hoping someone might have done this before (I bet someone has). Lots of things work - you can search for stm0810m_ws_001ftws and get a result, or (stm 0810 m*), or various other combinations. What does not work, is searching for (stm0810m_*) or (stm 0810 m_*) or anything like that - a problem, because often they don't want things with ma_ or mx_, but just m_. It's almost like underscores just break everything, escaping them does nothing. Here's the field definition (it should be what came with my solr): positionIncrementGap="100" > synonyms="synonyms.txt" ignoreCase="true" expand="false"/> words="stopwords.txt"/> generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/> protected="protwords.txt"/> and usage: Now, I thought textTight would be good because it's the one best suited for SKU's, but I guess I'm wrong. What should I be using for this? Would changing any of these "generateWordParts" or "catenateAll" options help? I can't seem to find any documentation so I'm really not sure what it would do, but reindexing this whole thing will take quite some time so I'd rather know what will actually work before I just start changing things. Thanks so much for any insight! -- Steve
Re: replication handler - compression
> It is useful only if your bandwidth is very low. > Otherwise the cost of copying/comprressing/decompressing can take up > more time than we save. I mean compressing and transferring. If the optimized index itself has a very high compression ratio then it is worth exploring the option of compresssing and transferring. And do not assume that all the files in the index directory is transferred during replication. It only transfers the files which are used by the current commit point and the ones which are absent in the slave > > > > On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins > <[EMAIL PROTECTED]> wrote: >> Is there an option on the replication handler to compress the files? >> >> >> >> I'm trying to replicate off site, and seem to have accumulated about >> 1.4gb. When compressed with winzip of all things i can get this down to >> about 10% of the size. >> >> >> >> Is compression in the pipeline / can it be if not! >> >> >> >> simon >> >> >> >> This message has been scanned for malware by SurfControl plc. >> www.surfcontrol.com >> > > > > -- > --Noble Paul > -- --Noble Paul
Re: timeouts
I may be a bit off the mark. It seems that DataImportHandler may be able to do this very easily for you. http://wiki.apache.org/solr/DataImportHandler#jdbcdatasource On Fri, Oct 24, 2008 at 6:28 PM, Simon Collins <[EMAIL PROTECTED]> wrote: > Hi > > > > We're running solr on a win 2k3 box under tomcat with about 100,000 records. > When doing large updates of records via solr sharp, solr completely freezes > and doesn't come back until we restart tomcat. > > > > This has only started happening since putting mysql on the same box (as a > source of the data to update from). > > > > Are there any known issues with running solr and mysql on the same box? When > it's frozen, the cpu usage is around 1-2% not exactly out of resources! > > > > Am i best using something else instead of tomcat? We're still trialling solr > (presently, used for our main site search www.shoe-shop.com and search and > navigation for our microsites ). It's an excellent search product, but I > don't want to fork out on new hardware for it just yet – until i know how > more about the performance and which environment i'm best to go for > (win/linux). > > > > If anyone has any suggestions/needs more info, i'd be extremely grateful. > > > > Thanks > Simon > > > Simon Collins > Systems Analyst > > Telephone: 01904 606 867 > Fax Number: 01904 528 791 > shoe-shop.com ltd > Catherine House > Northminster Business Park > Upper Poppleton, YORK > YO26 6QU > > www.shoe-shop.com > > > > This message (and any associated files) is intended only for the use of the > individual or entity to which it is addressed and may contain information > that is confidential, subject to copyright or constitutes a trade secret. If > you are not the intended recipient you are hereby notified that any > dissemination, copying or distribution of this message, or files associated > with this message, is strictly prohibited. If you have received this message > in error, please notify us immediately by replying to the message and > deleting it from your computer. Messages sent to and from us may be > monitored. > > Internet communications cannot be guaranteed to be secure or error-free as > information could be intercepted, corrupted, lost, destroyed, arrive late or > incomplete, or contain viruses. Therefore, we do not accept responsibility > for any errors or omissions that are present in this message, or any > attachment, that have arisen as a result of e-mail transmission. If > verification is required, please request a hard-copy version. Any views or > opinions presented are solely those of the author and do not necessarily > represent those of the company. (PAVD001) > > Shoe-shop.com Limited is a company registered in England and Wales with > company number 03817232. Vat Registration GB 734 256 241. Registered Office > Catherine House, Northminster Business Park, Upper Poppleton, YORK, YO26 > 6QU. > > This message has been scanned for malware by SurfControl plc. > www.surfcontrol.com -- --Noble Paul
Re: Advice needed on master-slave configuration
This is the JIRA location https://issues.apache.org/jira/secure/Dashboard.jspa The trunk is not changed a lot since 1.3 release. If it works for you you can just stick to the one you are using till you get a patch. --Noble On Mon, Oct 27, 2008 at 9:04 PM, William Pierce <[EMAIL PROTECTED]> wrote: > Folks: > > The replication handler works wonderfully! Thanks all! Now can someone > point me at a wiki so I can submit a jira issue lobbying for the inclusion > of this replication functionality in a 1.3 patch? > > Thanks, > - Bill > > -- > From: "Noble Paul ??? ??" <[EMAIL PROTECTED]> > Sent: Thursday, October 23, 2008 10:34 PM > To: > Subject: Re: Advice needed on master-slave configuration > >> It was committed on 10/21 >> >> take the latest 10/23 build >> http://people.apache.org/builds/lucene/solr/nightly/solr-2008-10-23.zip >> >> On Fri, Oct 24, 2008 at 2:27 AM, William Pierce <[EMAIL PROTECTED]> >> wrote: >>> >>> I tried the nightly build from 10/18 -- I did the following: >>> >>> a) I downloaded the nightly build of 10/18 (the zip file). >>> >>> b) I unpacked it and copied the war file to my tomcat lib folder. >>> >>> c) I made the relevant changes in the config files per the instructions >>> shown in the wiki. >>> >>> When tomcat starts, I see the error message in tomcat logs... >>> >>> Caused by: java.lang.ClassNotFoundException: solr.ReplicationHandler >>> at >>> >>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1358) >>> at >>> >>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1204) >>> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) >>> at java.lang.Class.forName0(Native Method) >>> at java.lang.Class.forName(Class.java:247) >>> at >>> >>> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:258) >>> ... 36 more >>> >>> Where do I get the nightly bits that will enable me to try this >>> replication >>> handler? >>> >>> Thanks, >>> - Bill >>> >>> -- >>> From: "Noble Paul ??? ??" <[EMAIL PROTECTED]> >>> Sent: Wednesday, October 22, 2008 10:51 PM >>> To: >>> Subject: Re: Advice needed on master-slave configuration >>> If you are using a nightly you can try the new SolrReplication feature http://wiki.apache.org/solr/SolrReplication On Thu, Oct 23, 2008 at 4:32 AM, William Pierce <[EMAIL PROTECTED]> wrote: > > Otis, > > Yes, I had forgotten that Windows will not permit me to overwrite > files > currently in use. So my copy scripts are failing. Windows will not > even > allow a rename of a folder containing a file in use so I am not sure > how > to > do this > > I am going to dig around and see what I can come up with short of > stopping/restarting tomcat... > > Thanks, > - Bill > > > -- > From: "Otis Gospodnetic" <[EMAIL PROTECTED]> > Sent: Wednesday, October 22, 2008 2:30 PM > To: > Subject: Re: Advice needed on master-slave configuration > >> Normally you don't have to start Q, but only "reload" Solr searcher >> when >> the index has been copied. >> However, you are on Windows, and its FS has the tendency not to let >> you >> delete/overwrite files that another app (Solr/java) has opened. Are >> you >> able to copy the index from U to Q? How are you doing it? Are you >> deleting >> index files from the index dir on Q that are no longer in the index >> dir >> on >> U? >> >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> - Original Message >>> >>> From: William Pierce <[EMAIL PROTECTED]> >>> To: solr-user@lucene.apache.org >>> Sent: Wednesday, October 22, 2008 5:24:28 PM >>> Subject: Advice needed on master-slave configuration >>> >>> Folks: >>> >>> I have two instances of solr running one on the master (U) and the >>> other >>> on >>> the slave (Q). Q is used for queries only, while U is where >>> updates/deletes >>> are done. I am running on Windows so unfortunately I cannot use the >>> distribution scripts. >>> >>> Every N hours when changes are committed and the index on U is >>> updated, >>> I >>> want to copy the files from the master to the slave.Do I need to >>> halt >>> the solr server on Q while the index is being updated? If not, how >>> do >>> I >>> copy the files into the data folder while the server is running? Any >>> pointers would be greatly appreciated! >>> >>> Thanks! >>> >>> - Bill >> >> > -- --Noble Paul >>> >> >> >> >> -- >> --Noble Pa
Re: replication handler - compression
Are you sure you optimized the index? It is useful only if your bandwidth is very low. Otherwise the cost of copying/comprressing/decompressing can take up more time than we save. On Tue, Oct 28, 2008 at 2:49 AM, Simon Collins <[EMAIL PROTECTED]> wrote: > Is there an option on the replication handler to compress the files? > > > > I'm trying to replicate off site, and seem to have accumulated about > 1.4gb. When compressed with winzip of all things i can get this down to > about 10% of the size. > > > > Is compression in the pipeline / can it be if not! > > > > simon > > > > This message has been scanned for malware by SurfControl plc. > www.surfcontrol.com > -- --Noble Paul
Re: Entity extraction?
On Oct 27, 2008, at 8:53 PM, Ryan McKinley wrote: On Oct 27, 2008, at 6:10 PM, Grant Ingersoll wrote: Warning: shameless plug: Tom Morton and I have a chapter on NER and OpenNLP (and Solr, for that matter) in our book "Taming Text" (Manning) and the code will be open once we have a place to put it (hopefully soon). In fact, you'll see us doing a lot of this kind of stuff w/ Solr and it should all be coming back to Solr/ Lucene/Mahout at some point (for instance, see https://issues.apache.org/jira/browse/SOLR-769 , as I'm sure FAST told you they can do clustering, too!) --end shameless plug --- thats great! I just got the MEAP copy, it looks really good http://www.manning.com/ingersoll/ Thanks! As for Mahout, NER is a classification problem, and there are some tools in Mahout to do classification, but nothing specifically targeted at NER at the moment. Mahout, like Nutch, also takes advantage of Hadoop for scaling. The combination of Mahout in Solr makes a lot of sense, IMO. Perhaps this is more appropriate to ask on the mahout list, but... when you say "Mahout, like Nutch, also takes advantage of Hadoop for scaling", does that mean that much of Mahout requires hadoop? Is it possible to do smaller scale problems on a simple setup and only invoke hadoop when required? Yes, probably better asked on Mahout, but to answer your question, yes, most of the implementations require Hadoop so far, but it is not a strict requirement. That being said, it is fairly easy to run them on a simple setup (i.e. single node).
Re: Entity extraction?
On Oct 27, 2008, at 6:10 PM, Grant Ingersoll wrote: Warning: shameless plug: Tom Morton and I have a chapter on NER and OpenNLP (and Solr, for that matter) in our book "Taming Text" (Manning) and the code will be open once we have a place to put it (hopefully soon). In fact, you'll see us doing a lot of this kind of stuff w/ Solr and it should all be coming back to Solr/ Lucene/Mahout at some point (for instance, see https://issues.apache.org/jira/browse/SOLR-769 , as I'm sure FAST told you they can do clustering, too!) --end shameless plug --- thats great! I just got the MEAP copy, it looks really good http://www.manning.com/ingersoll/ As for Mahout, NER is a classification problem, and there are some tools in Mahout to do classification, but nothing specifically targeted at NER at the moment. Mahout, like Nutch, also takes advantage of Hadoop for scaling. The combination of Mahout in Solr makes a lot of sense, IMO. Perhaps this is more appropriate to ask on the mahout list, but... when you say "Mahout, like Nutch, also takes advantage of Hadoop for scaling", does that mean that much of Mahout requires hadoop? Is it possible to do smaller scale problems on a simple setup and only invoke hadoop when required? ryan
Re: Entity extraction?
Warning: shameless plug: Tom Morton and I have a chapter on NER and OpenNLP (and Solr, for that matter) in our book "Taming Text" (Manning) and the code will be open once we have a place to put it (hopefully soon). In fact, you'll see us doing a lot of this kind of stuff w/ Solr and it should all be coming back to Solr/Lucene/ Mahout at some point (for instance, see https://issues.apache.org/jira/browse/SOLR-769 , as I'm sure FAST told you they can do clustering, too!) --end shameless plug --- As for Mahout, NER is a classification problem, and there are some tools in Mahout to do classification, but nothing specifically targeted at NER at the moment. Mahout, like Nutch, also takes advantage of Hadoop for scaling. The combination of Mahout in Solr makes a lot of sense, IMO. On Oct 25, 2008, at 11:25 PM, Vaijanath N. Rao wrote: Hi, One can use the OpenNLP Max entropy library and create there own named-entity extraction. I had used it in one of the projects which I did with Solr. It is easy to integrate most of the NLP libraries with Solr. Though we had named-entity extraction embedded in our crawler which would populate a field called entities in the database, which we would ingest in Solr as yet another field. --Thanks and Regards Vaijanath N. Rao Julien Nioche wrote: Hi, Open Source NLP platforms like GATE (http://gate.ac.uk) or Apache UIMA are typically used for these types of tasks. GATE in particular comes with an application called ANNIE which does Named Entity Recognition. OpenCalais does that as well and should be easy to embed, but it can't be tuned to do more specific things unlike UIMA or GATE based applications. Depending on the architecture you have in mind it could be worth investigating Nutch and add the NER as a custom plugin; NLP being often a CPU intensive task you could leverage the scalability of Hadoop in Nutch. There is a patch which allows to delegate the indexing to SOLR. As someone else already said these named entities could then be used as facets. HTH Julien -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
replication handler - compression
Is there an option on the replication handler to compress the files? I'm trying to replicate off site, and seem to have accumulated about 1.4gb. When compressed with winzip of all things i can get this down to about 10% of the size. Is compression in the pipeline / can it be if not! simon This message has been scanned for malware by SurfControl plc. www.surfcontrol.com
Re: solr 1.3 language managing boosting
Boost at index time or at query time? For index time, you would add the boost on the field/document. At query time, you can add boosts to each term that belongs to a specific field. On Oct 27, 2008, at 2:10 PM, sunnyfr wrote: Hi, I've my field in the schema which are text_es, text_fr, text_ln And I would like to boost them according the field language, How could I do that, According to the fact that I've stored all this field ??? Thanks a lot for your help, Sunny -- View this message in context: http://www.nabble.com/solr-1.3-language-managing-boosting-tp20193102p20193102.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Delete by query isn't working for a specific query
Thank you both for your nice answers. I will try it out. 2008/10/27 Erik Hatcher <[EMAIL PROTECTED]> > I don't think delete-by-query supports purely negative queries, even though > they are supported for q and fq parameters for searches. > > Try using: > > *:* AND -deptId:[1 TO *] > >Erik > > > On Oct 27, 2008, at 9:21 AM, Alexander Ramos Jardim wrote: > > Hey pals, >> >> I am trying to delete a couple documents that don't have any value on a >> given integer field. This is the command I am executing: >> >> $curl http://:/solr/update -H 'Content-Type:text/xml' -d >> '-(deptId:[1 TO *])' >> $curl http://:/solr/update -H 'Content-Type:text/xml' -d >> '' >> >> But the documents don't get deleted. >> >> Solr doesn't return any error message, its log seems ok. Any idea? >> -- >> Alexander Ramos Jardim >> > > -- Alexander Ramos Jardim
solr 1.3 language managing boosting
Hi, I've my field in the schema which are text_es, text_fr, text_ln And I would like to boost them according the field language, How could I do that, According to the fact that I've stored all this field ??? Thanks a lot for your help, Sunny -- View this message in context: http://www.nabble.com/solr-1.3-language-managing-boosting-tp20193102p20193102.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Entity extraction?
Extractors are exactly as good as the data you have to train or configure them with. An open source extractor platform may still require you to come up with a rather large heap of data from somewhere. Not all the vendors of extractors lose money. How useful NEE is for search is an ongoing question that depends on what sort of data you are working with and what sort of precision challenges most concern you. On Mon, Oct 27, 2008 at 12:34 PM, Walter Underwood <[EMAIL PROTECTED]> wrote: > Verity sold a lot of features based on "we might need it at some point." > Very few people deployed the advanced features. They just didn't need them. > > wunder > > On 10/27/08 9:27 AM, "Charlie Jackson" <[EMAIL PROTECTED]> wrote: > >> Yeah, when they first mentioned it, my initial thought was "cool, but we >> don't >> need it." However, some of the higher ups in the company are saying we might >> want it at some point, so I've been asked to look into it. I'll be sure to >> let >> them know about the flaws in the concept, thanks for that info. >> >> >> Charlie Jackson >> [EMAIL PROTECTED] >> >> >> -Original Message- >> From: Walter Underwood [mailto:[EMAIL PROTECTED] >> Sent: Monday, October 27, 2008 11:17 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Entity extraction? >> >> The vendor mentioned entity extraction, but that doesn't mean you need it. >> Entity extraction is a pretty specific technology, and it has been a >> money-losing product at many companies for many years, going back to >> Xerox ThingFinder well over ten years ago. >> >> My guess is that very few people really need entity extraction. >> >> Using EE for automatic taxonomy generation is even harder to get right. >> At best, that is a way to get a starter set of categories that you can >> edit. You will not get a production quality taxonomy automatically. >> >> wunder >> >> On 10/27/08 8:31 AM, "Charlie Jackson" <[EMAIL PROTECTED]> wrote: >> >>> True, though I may be able to convince the powers that be that it's worth >>> the >>> investment. >>> >>> There are a number of open source or free tools listed on the Wikipedia >>> entry >>> for entity extraction >>> (http://en.wikipedia.org/wiki/Named_entity_recognition#Open_source_or_free) >>> -- >>> does anyone have any experience with any of these? >>> >>> >>> Charlie Jackson >>> 312-873-6537 >>> [EMAIL PROTECTED] >>> >>> -Original Message- >>> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] >>> Sent: Monday, October 27, 2008 10:23 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Entity extraction? >>> >>> For the record, LingPipe is not free. It's good, but it's not free. >>> >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> >>> >>> - Original Message From: Rafael Rossini <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, October 24, 2008 6:08:14 PM Subject: Re: Entity extraction? Solr can do a simple facet seach like FAST, but the entity extraction demands other tecnologies. I do not know how FAST does it but at the company I´m working on (www.cortex-intelligence.com), we use a mix of statistical and language-specific tasks to recognize and categorize entities in the text. Ling Pipe is another tool (free) that does that too. In case you would like to see a simple demo: http://www.cortex-intelligence.com/tech/ Rossini On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson > wrote: > During a recent sales pitch to my company by FAST, they mentioned entity > extraction. I'd never heard of it before, but they described it as > basically recognizing people/places/things in documents being indexed > and then being able to do faceting on this data at query time. Does > anything like this already exist in SOLR? If not, I'm not opposed to > developing it myself, but I could use some pointers on where to start. > > > > Thanks, > > - Charlie > > >>> >>> >>> >> >> >> > >
Re: Entity extraction?
Verity sold a lot of features based on "we might need it at some point." Very few people deployed the advanced features. They just didn't need them. wunder On 10/27/08 9:27 AM, "Charlie Jackson" <[EMAIL PROTECTED]> wrote: > Yeah, when they first mentioned it, my initial thought was "cool, but we don't > need it." However, some of the higher ups in the company are saying we might > want it at some point, so I've been asked to look into it. I'll be sure to let > them know about the flaws in the concept, thanks for that info. > > > Charlie Jackson > [EMAIL PROTECTED] > > > -Original Message- > From: Walter Underwood [mailto:[EMAIL PROTECTED] > Sent: Monday, October 27, 2008 11:17 AM > To: solr-user@lucene.apache.org > Subject: Re: Entity extraction? > > The vendor mentioned entity extraction, but that doesn't mean you need it. > Entity extraction is a pretty specific technology, and it has been a > money-losing product at many companies for many years, going back to > Xerox ThingFinder well over ten years ago. > > My guess is that very few people really need entity extraction. > > Using EE for automatic taxonomy generation is even harder to get right. > At best, that is a way to get a starter set of categories that you can > edit. You will not get a production quality taxonomy automatically. > > wunder > > On 10/27/08 8:31 AM, "Charlie Jackson" <[EMAIL PROTECTED]> wrote: > >> True, though I may be able to convince the powers that be that it's worth the >> investment. >> >> There are a number of open source or free tools listed on the Wikipedia entry >> for entity extraction >> (http://en.wikipedia.org/wiki/Named_entity_recognition#Open_source_or_free) >> -- >> does anyone have any experience with any of these? >> >> >> Charlie Jackson >> 312-873-6537 >> [EMAIL PROTECTED] >> >> -Original Message- >> From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] >> Sent: Monday, October 27, 2008 10:23 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Entity extraction? >> >> For the record, LingPipe is not free. It's good, but it's not free. >> >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> - Original Message >>> From: Rafael Rossini <[EMAIL PROTECTED]> >>> To: solr-user@lucene.apache.org >>> Sent: Friday, October 24, 2008 6:08:14 PM >>> Subject: Re: Entity extraction? >>> >>> Solr can do a simple facet seach like FAST, but the entity extraction >>> demands other tecnologies. I do not know how FAST does it but at the company >>> I´m working on (www.cortex-intelligence.com), we use a mix of statistical >>> and language-specific tasks to recognize and categorize entities in the >>> text. Ling Pipe is another tool (free) that does that too. In case you would >>> like to see a simple demo: http://www.cortex-intelligence.com/tech/ >>> >>> Rossini >>> >>> >>> On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson wrote: >>> During a recent sales pitch to my company by FAST, they mentioned entity extraction. I'd never heard of it before, but they described it as basically recognizing people/places/things in documents being indexed and then being able to do faceting on this data at query time. Does anything like this already exist in SOLR? If not, I'm not opposed to developing it myself, but I could use some pointers on where to start. Thanks, - Charlie >> >> >> > > >
Re: Entity extraction?
Well... IMHO that depends. One of the services we provide is a "automatic clipping" in which our client chooses 20~30 texts from the media he woud like to be aware. With classification algorithms we then keep him aware of every new text of his interest. We gained about 10% of precision just by adding EE information to the algorithm. Rossini On Mon, Oct 27, 2008 at 2:17 PM, Walter Underwood <[EMAIL PROTECTED]>wrote: > The vendor mentioned entity extraction, but that doesn't mean you need it. > Entity extraction is a pretty specific technology, and it has been a > money-losing product at many companies for many years, going back to > Xerox ThingFinder well over ten years ago. > > My guess is that very few people really need entity extraction. > > Using EE for automatic taxonomy generation is even harder to get right. > At best, that is a way to get a starter set of categories that you can > edit. You will not get a production quality taxonomy automatically. > > wunder > > On 10/27/08 8:31 AM, "Charlie Jackson" <[EMAIL PROTECTED]> wrote: > > > True, though I may be able to convince the powers that be that it's worth > the > > investment. > > > > There are a number of open source or free tools listed on the Wikipedia > entry > > for entity extraction > > ( > http://en.wikipedia.org/wiki/Named_entity_recognition#Open_source_or_free) > -- > > does anyone have any experience with any of these? > > > > > > Charlie Jackson > > 312-873-6537 > > [EMAIL PROTECTED] > > > > -Original Message- > > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > > Sent: Monday, October 27, 2008 10:23 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Entity extraction? > > > > For the record, LingPipe is not free. It's good, but it's not free. > > > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > - Original Message > >> From: Rafael Rossini <[EMAIL PROTECTED]> > >> To: solr-user@lucene.apache.org > >> Sent: Friday, October 24, 2008 6:08:14 PM > >> Subject: Re: Entity extraction? > >> > >> Solr can do a simple facet seach like FAST, but the entity extraction > >> demands other tecnologies. I do not know how FAST does it but at the > company > >> I´m working on (www.cortex-intelligence.com), we use a mix of > statistical > >> and language-specific tasks to recognize and categorize entities in the > >> text. Ling Pipe is another tool (free) that does that too. In case you > would > >> like to see a simple demo: http://www.cortex-intelligence.com/tech/ > >> > >> Rossini > >> > >> > >> On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson > >>> wrote: > >> > >>> During a recent sales pitch to my company by FAST, they mentioned > entity > >>> extraction. I'd never heard of it before, but they described it as > >>> basically recognizing people/places/things in documents being indexed > >>> and then being able to do faceting on this data at query time. Does > >>> anything like this already exist in SOLR? If not, I'm not opposed to > >>> developing it myself, but I could use some pointers on where to start. > >>> > >>> > >>> > >>> Thanks, > >>> > >>> - Charlie > >>> > >>> > > > > > > > >
RE: Entity extraction?
Yeah, when they first mentioned it, my initial thought was "cool, but we don't need it." However, some of the higher ups in the company are saying we might want it at some point, so I've been asked to look into it. I'll be sure to let them know about the flaws in the concept, thanks for that info. Charlie Jackson [EMAIL PROTECTED] -Original Message- From: Walter Underwood [mailto:[EMAIL PROTECTED] Sent: Monday, October 27, 2008 11:17 AM To: solr-user@lucene.apache.org Subject: Re: Entity extraction? The vendor mentioned entity extraction, but that doesn't mean you need it. Entity extraction is a pretty specific technology, and it has been a money-losing product at many companies for many years, going back to Xerox ThingFinder well over ten years ago. My guess is that very few people really need entity extraction. Using EE for automatic taxonomy generation is even harder to get right. At best, that is a way to get a starter set of categories that you can edit. You will not get a production quality taxonomy automatically. wunder On 10/27/08 8:31 AM, "Charlie Jackson" <[EMAIL PROTECTED]> wrote: > True, though I may be able to convince the powers that be that it's worth the > investment. > > There are a number of open source or free tools listed on the Wikipedia entry > for entity extraction > (http://en.wikipedia.org/wiki/Named_entity_recognition#Open_source_or_free) -- > does anyone have any experience with any of these? > > > Charlie Jackson > 312-873-6537 > [EMAIL PROTECTED] > > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Monday, October 27, 2008 10:23 AM > To: solr-user@lucene.apache.org > Subject: Re: Entity extraction? > > For the record, LingPipe is not free. It's good, but it's not free. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Rafael Rossini <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Friday, October 24, 2008 6:08:14 PM >> Subject: Re: Entity extraction? >> >> Solr can do a simple facet seach like FAST, but the entity extraction >> demands other tecnologies. I do not know how FAST does it but at the company >> I´m working on (www.cortex-intelligence.com), we use a mix of statistical >> and language-specific tasks to recognize and categorize entities in the >> text. Ling Pipe is another tool (free) that does that too. In case you would >> like to see a simple demo: http://www.cortex-intelligence.com/tech/ >> >> Rossini >> >> >> On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson >>> wrote: >> >>> During a recent sales pitch to my company by FAST, they mentioned entity >>> extraction. I'd never heard of it before, but they described it as >>> basically recognizing people/places/things in documents being indexed >>> and then being able to do faceting on this data at query time. Does >>> anything like this already exist in SOLR? If not, I'm not opposed to >>> developing it myself, but I could use some pointers on where to start. >>> >>> >>> >>> Thanks, >>> >>> - Charlie >>> >>> > > >
Greek - solr 1.3
Hi, I would like to know if I have to do something special for Greek's characters? My schema is configurate like that: It just stores documents which doesn't have greek characters All every language are working fine. Any idea ??? Thanks a lot, -- View this message in context: http://www.nabble.com/Greek---solr-1.3-tp20191072p20191072.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Entity extraction?
The vendor mentioned entity extraction, but that doesn't mean you need it. Entity extraction is a pretty specific technology, and it has been a money-losing product at many companies for many years, going back to Xerox ThingFinder well over ten years ago. My guess is that very few people really need entity extraction. Using EE for automatic taxonomy generation is even harder to get right. At best, that is a way to get a starter set of categories that you can edit. You will not get a production quality taxonomy automatically. wunder On 10/27/08 8:31 AM, "Charlie Jackson" <[EMAIL PROTECTED]> wrote: > True, though I may be able to convince the powers that be that it's worth the > investment. > > There are a number of open source or free tools listed on the Wikipedia entry > for entity extraction > (http://en.wikipedia.org/wiki/Named_entity_recognition#Open_source_or_free) -- > does anyone have any experience with any of these? > > > Charlie Jackson > 312-873-6537 > [EMAIL PROTECTED] > > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Monday, October 27, 2008 10:23 AM > To: solr-user@lucene.apache.org > Subject: Re: Entity extraction? > > For the record, LingPipe is not free. It's good, but it's not free. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Rafael Rossini <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Friday, October 24, 2008 6:08:14 PM >> Subject: Re: Entity extraction? >> >> Solr can do a simple facet seach like FAST, but the entity extraction >> demands other tecnologies. I do not know how FAST does it but at the company >> I´m working on (www.cortex-intelligence.com), we use a mix of statistical >> and language-specific tasks to recognize and categorize entities in the >> text. Ling Pipe is another tool (free) that does that too. In case you would >> like to see a simple demo: http://www.cortex-intelligence.com/tech/ >> >> Rossini >> >> >> On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson >>> wrote: >> >>> During a recent sales pitch to my company by FAST, they mentioned entity >>> extraction. I'd never heard of it before, but they described it as >>> basically recognizing people/places/things in documents being indexed >>> and then being able to do faceting on this data at query time. Does >>> anything like this already exist in SOLR? If not, I'm not opposed to >>> developing it myself, but I could use some pointers on where to start. >>> >>> >>> >>> Thanks, >>> >>> - Charlie >>> >>> > > >
Re: Question about facet.prefix usage
Hi Simon, I came across your post to the solr users list about using facet prefixes, shown below. I was wondering if you were still using your modified version of SimpleFacets.java, and if so -- if you could send me a copy. I'll need to implement something similar, and it never hurts to start from existing material. Thanks, Peter Simon Hu wrote: I also need the exact same feature. I was not able to find an easy solution and ended up modifying class SimpleFacets to make it accept an array of facet prefixes per field. If you are interested, I can email you the modified SimpleFacets.java. -Simon steve berry-2 wrote: Question: Is it possible to pass complex queries to facet.prefix? Example instead of facet.prefix:foo I want facet.prefix:foo OR facet.prefix:bar My application is for browsing business records that fall into categories. The user is only allowed to see businesses falling into categories which they have access to. I have a series of documents dumped into the following basic structure which I was hoping would help me deal with this: 123 Business Corp. 28255-0001 . charlotte_2006 Banks charlotte_2007 Banks sanfrancisco_2006 Banks sanfrancisco_2007 Banks ... (lots more market_category entries) ... 124 Factory Corp. 28205-0001 . charlotte_2006 Banks charlotte_2007 Banks austin_2006 Banks austin_2007 Banks ... (lots more market_category entries) ... . The multivalued market_category fields are flattened relational data attributed to that business and I want to use those values for facted navigation /but/ I want the facets to be restricted depending on what products the user has access to. For example a user may have access to sanfrancisco_2007 and sanfrancisco_2006 data but nothing else. So I've created a request using facet.prefix that looks something like this: http://SOLRSERVER:8080/solr/select?q.op=AND&q=docType:gen&facet.field=market_category&facet.prefix=charlotte_2007 This ends up producing perfectly suitable facet results that look like this: .. 1 1 1 1 1 1 0 . Bingo! facet.prefix does exactly what I want it to. Now I want to go a step further and pass a compound statement to the facet.prefix along the lines of "facet.prefix:charlotte_2007 OR sanfrancisco_2007" or "facet.prefix:charlotte_2007 OR charlotte_2006" to return more complex facet sets. As far as I can tell looking at the docs this won't work. Is this possible using the existing facet.prefix functionality? Anyone have a better idea of how I should accomplish this? Thanks, steve berry American City Business Journals
RE: Entity extraction?
True, though I may be able to convince the powers that be that it's worth the investment. There are a number of open source or free tools listed on the Wikipedia entry for entity extraction (http://en.wikipedia.org/wiki/Named_entity_recognition#Open_source_or_free) -- does anyone have any experience with any of these? Charlie Jackson 312-873-6537 [EMAIL PROTECTED] -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Monday, October 27, 2008 10:23 AM To: solr-user@lucene.apache.org Subject: Re: Entity extraction? For the record, LingPipe is not free. It's good, but it's not free. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Rafael Rossini <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, October 24, 2008 6:08:14 PM > Subject: Re: Entity extraction? > > Solr can do a simple facet seach like FAST, but the entity extraction > demands other tecnologies. I do not know how FAST does it but at the company > I´m working on (www.cortex-intelligence.com), we use a mix of statistical > and language-specific tasks to recognize and categorize entities in the > text. Ling Pipe is another tool (free) that does that too. In case you would > like to see a simple demo: http://www.cortex-intelligence.com/tech/ > > Rossini > > > On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson > > wrote: > > > During a recent sales pitch to my company by FAST, they mentioned entity > > extraction. I'd never heard of it before, but they described it as > > basically recognizing people/places/things in documents being indexed > > and then being able to do faceting on this data at query time. Does > > anything like this already exist in SOLR? If not, I'm not opposed to > > developing it myself, but I could use some pointers on where to start. > > > > > > > > Thanks, > > > > - Charlie > > > >
Re: Advice needed on master-slave configuration
Folks: The replication handler works wonderfully! Thanks all! Now can someone point me at a wiki so I can submit a jira issue lobbying for the inclusion of this replication functionality in a 1.3 patch? Thanks, - Bill -- From: "Noble Paul ??? ??" <[EMAIL PROTECTED]> Sent: Thursday, October 23, 2008 10:34 PM To: Subject: Re: Advice needed on master-slave configuration It was committed on 10/21 take the latest 10/23 build http://people.apache.org/builds/lucene/solr/nightly/solr-2008-10-23.zip On Fri, Oct 24, 2008 at 2:27 AM, William Pierce <[EMAIL PROTECTED]> wrote: I tried the nightly build from 10/18 -- I did the following: a) I downloaded the nightly build of 10/18 (the zip file). b) I unpacked it and copied the war file to my tomcat lib folder. c) I made the relevant changes in the config files per the instructions shown in the wiki. When tomcat starts, I see the error message in tomcat logs... Caused by: java.lang.ClassNotFoundException: solr.ReplicationHandler at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1358) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1204) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:258) ... 36 more Where do I get the nightly bits that will enable me to try this replication handler? Thanks, - Bill -- From: "Noble Paul ??? ??" <[EMAIL PROTECTED]> Sent: Wednesday, October 22, 2008 10:51 PM To: Subject: Re: Advice needed on master-slave configuration If you are using a nightly you can try the new SolrReplication feature http://wiki.apache.org/solr/SolrReplication On Thu, Oct 23, 2008 at 4:32 AM, William Pierce <[EMAIL PROTECTED]> wrote: Otis, Yes, I had forgotten that Windows will not permit me to overwrite files currently in use. So my copy scripts are failing. Windows will not even allow a rename of a folder containing a file in use so I am not sure how to do this I am going to dig around and see what I can come up with short of stopping/restarting tomcat... Thanks, - Bill -- From: "Otis Gospodnetic" <[EMAIL PROTECTED]> Sent: Wednesday, October 22, 2008 2:30 PM To: Subject: Re: Advice needed on master-slave configuration Normally you don't have to start Q, but only "reload" Solr searcher when the index has been copied. However, you are on Windows, and its FS has the tendency not to let you delete/overwrite files that another app (Solr/java) has opened. Are you able to copy the index from U to Q? How are you doing it? Are you deleting index files from the index dir on Q that are no longer in the index dir on U? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: William Pierce <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, October 22, 2008 5:24:28 PM Subject: Advice needed on master-slave configuration Folks: I have two instances of solr running one on the master (U) and the other on the slave (Q). Q is used for queries only, while U is where updates/deletes are done. I am running on Windows so unfortunately I cannot use the distribution scripts. Every N hours when changes are committed and the index on U is updated, I want to copy the files from the master to the slave.Do I need to halt the solr server on Q while the index is being updated? If not, how do I copy the files into the data folder while the server is running? Any pointers would be greatly appreciated! Thanks! - Bill -- --Noble Paul -- --Noble Paul
Re: Entity extraction?
For the record, LingPipe is not free. It's good, but it's not free. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Rafael Rossini <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, October 24, 2008 6:08:14 PM > Subject: Re: Entity extraction? > > Solr can do a simple facet seach like FAST, but the entity extraction > demands other tecnologies. I do not know how FAST does it but at the company > I´m working on (www.cortex-intelligence.com), we use a mix of statistical > and language-specific tasks to recognize and categorize entities in the > text. Ling Pipe is another tool (free) that does that too. In case you would > like to see a simple demo: http://www.cortex-intelligence.com/tech/ > > Rossini > > > On Fri, Oct 24, 2008 at 6:18 PM, Charlie Jackson > > wrote: > > > During a recent sales pitch to my company by FAST, they mentioned entity > > extraction. I'd never heard of it before, but they described it as > > basically recognizing people/places/things in documents being indexed > > and then being able to do faceting on this data at query time. Does > > anything like this already exist in SOLR? If not, I'm not opposed to > > developing it myself, but I could use some pointers on where to start. > > > > > > > > Thanks, > > > > - Charlie > > > >
solr1.3 / tomcat 55 échelle
Hi,, I'm using, solr1.3 tomcat55 and I've got this error, when I fire : ...8180/solr/video/select/?q=échelle − 0 150 − échelle − − 2007-10-31T10:48:34Z 5625531 FR 10 − Régis pompier http://www.nabble.com/solr1.3---tomcat-55-%3Cb%3E%3Cstr-name%3D%22q%22%3E%C3%83%C2%A9chelle%3C-str%3E%3C-b%3E-tp20189184p20189184.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting performance + replication of index between cores
Hi, After fully reloading my index, using another field than a Data does not help that much. Using a warmup query avoids having the first request slow, but: - Frequents commits means that the Searcher is reloaded frequently and, as the warmup takes time, the clients must wait. - Having warmup slows down the index process (I guess this is because after a commit, the Searchers are recreated) So I'm considering, as suggested, to have two instances: one for indexing and one for searching. I was wondering if there are simple ways to replicate the index in a single Solr server running two cores ? Any such config already tested ? I guess that the standard replication based on rsync can be simplified a lot in this case as the two indexes are on the same server. Thanks Christophe Beniamin Janicki wrote: :so you can send your updates anytime you want, and as long as you only :commit every 5 minutes (or commit on a master as often as you want, but :only run snappuller/snapinstaller on your slaves every 5 minutes) your :results will be at most 5minutes + warming time stale. This is what I do as well ( commits are done once per 5 minutes ). I've got master - slave configuration. Master has turned off all caches (commented in solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB ,Xmx= 1GB and committing takes around 10 secs ( on default configuration with warming it took from 30 mins up to 2 hours). Slave caches are configured to have autowarmCount="0" and maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is done. I haven't noticed any huge delays while serving search request. Try to use those values - may be they'll help in your case too. Ben Janicki -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 22 October 2008 04:56 To: solr-user@lucene.apache.org Subject: Re: Sorting performance : The problem is that I will have hundreds of users doing queries, and a : continuous flow of document coming in. : So a delay in warming up a cache "could" be acceptable if I do it a few times : per day. But not on a too regular basis (right now, the first query that loads : the cache takes 150s). : : However: I'm not sure why it looks not to be a good idea to update the caches you can refresh the caches automaticly after updating, the "newSearcher" event is fired whenever a searcher is opened (but before it's used by clients) so you can configure warming queries for it -- it doesn't have to be done manually (or by the first user to use that reader) so you can send your updates anytime you want, and as long as you only commit every 5 minutes (or commit on a master as often as you want, but only run snappuller/snapinstaller on your slaves every 5 minutes) your results will be at most 5minutes + warming time stale. -Hoss
solr 1.3 multi language ?
Hi, I try to boost some language, I would like to know if it's necessary to store them to be able to boost them using dismax? Thanks a lot, Sunny -- View this message in context: http://www.nabble.com/solr-1.3-multi-language---tp20188549p20188549.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Delete by query isn't working for a specific query
I don't think delete-by-query supports purely negative queries, even though they are supported for q and fq parameters for searches. Try using: *:* AND -deptId:[1 TO *] Erik On Oct 27, 2008, at 9:21 AM, Alexander Ramos Jardim wrote: Hey pals, I am trying to delete a couple documents that don't have any value on a given integer field. This is the command I am executing: $curl http://:/solr/update -H 'Content-Type:text/xml' -d '-(deptId:[1 TO *])' $curl http://:/solr/update -H 'Content-Type:text/xml' -d '' But the documents don't get deleted. Solr doesn't return any error message, its log seems ok. Any idea? -- Alexander Ramos Jardim
Re: Delete by query isn't working for a specific query
Alexander Ramos Jardim wrote: Hey pals, I am trying to delete a couple documents that don't have any value on a given integer field. This is the command I am executing: $curl http://:/solr/update -H 'Content-Type:text/xml' -d '-(deptId:[1 TO *])' $curl http://:/solr/update -H 'Content-Type:text/xml' -d '' But the documents don't get deleted. Solr doesn't return any error message, its log seems ok. Any idea? I think deletebyquery uses the Lucene query parser right? So you can't do a pure negative query - gotto do a matchall first. - Mark
Delete by query isn't working for a specific query
Hey pals, I am trying to delete a couple documents that don't have any value on a given integer field. This is the command I am executing: $curl http://:/solr/update -H 'Content-Type:text/xml' -d '-(deptId:[1 TO *])' $curl http://:/solr/update -H 'Content-Type:text/xml' -d '' But the documents don't get deleted. Solr doesn't return any error message, its log seems ok. Any idea? -- Alexander Ramos Jardim