Re: Entity extraction?
Hi, One can use the OpenNLP Max entropy library and create there own named-entity extraction. I had used it in one of the projects which I did with Solr. It is easy to integrate most of the NLP libraries with Solr. Though we had named-entity extraction embedded in our crawler which would populate a field called entities in the database, which we would ingest in Solr as yet another field. --Thanks and Regards Vaijanath N. Rao Julien Nioche wrote: Hi, Open Source NLP platforms like GATE (http://gate.ac.uk) or Apache UIMA are typically used for these types of tasks. GATE in particular comes with an application called ANNIE which does Named Entity Recognition. OpenCalais does that as well and should be easy to embed, but it can't be tuned to do more specific things unlike UIMA or GATE based applications. Depending on the architecture you have in mind it could be worth investigating Nutch and add the NER as a custom plugin; NLP being often a CPU intensive task you could leverage the scalability of Hadoop in Nutch. There is a patch which allows to delegate the indexing to SOLR. As someone else already said these named entities could then be used as facets. HTH Julien
Re: Query problem related to * symbol
On Sat, Oct 25, 2008 at 2:00 PM, Aleksey Gogolev <[EMAIL PROTECTED]> wrote: > I made this query: > http://localhost:8983/solr/select/?q=suggestion:ipod+nano+80* Note that in Lucene syntax, this query is equivalent to suggestion:ipod default_field:nano default_field:80* For debugging, add debugQuery=true to your request to see what the parsed query looks like. -Yonik
Query problem related to * symbol
Hello. I made this query: http://localhost:8983/solr/select/?q=suggestion:ipod+nano+80* and response contains the following doc: - 04adea06fcfdc939feec63799045076c apple ma045 for ipod 80gb nano 2008-10-25T16:50:48.703Z - Then I made this query (the "g" letter is added): http://localhost:8983/solr/select/?q=suggestion:ipod+nano+80g* and I expect to see the same doc in response, but response was empty. In first moment I thought that this strange behaviour is caused by SynonymFilter, but I checked the type of field "suggestion", and it is quite simple, and the filter chain doesn't contain SynonymFilter: -- -- Any ideas about reasons of this strange behaviour? -- Aleksey Gogolev developer, dev.co.ua Aleksey
Re: Lucene project & subprojects news RSS feed?
I don't believe there is one, but a patch would be welcome to add one. On Oct 24, 2008, at 6:46 PM, David Smiley @MITRE.org wrote: On the main lucene web page: http://lucene.apache.org/index.html There is a list of news items spanning all the lucene subprojects. Does anyone know if there is an RSS feed or "announce" mailing list that has this information? ~ David Smiley -- View this message in context: http://www.nabble.com/Lucene-project---subprojects-news-RSS-feed--tp20158991p20158991.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: customizing results in StandardQueryHandler
: Subject: customizing results in StandardQueryHandler : In-Reply-To: <[EMAIL PROTECTED]> http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is "hidden" in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/Thread_hijacking -Hoss
Re: Entity extraction?
Hi, Open Source NLP platforms like GATE (http://gate.ac.uk) or Apache UIMA are typically used for these types of tasks. GATE in particular comes with an application called ANNIE which does Named Entity Recognition. OpenCalais does that as well and should be easy to embed, but it can't be tuned to do more specific things unlike UIMA or GATE based applications. Depending on the architecture you have in mind it could be worth investigating Nutch and add the NER as a custom plugin; NLP being often a CPU intensive task you could leverage the scalability of Hadoop in Nutch. There is a patch which allows to delegate the indexing to SOLR. As someone else already said these named entities could then be used as facets. HTH Julien -- DigitalPebble Ltd http://www.digitalpebble.com 2008/10/24 Rogerio Pereira <[EMAIL PROTECTED]> > I agree Ryan and I would like see a completly integration between solr, > nutch, tika and mahout in the future. > > 2008/10/24 Ryan McKinley <[EMAIL PROTECTED]> > > > This is not something solr does currently... > > > > It sounds like something that should be added to Mahout: > > http://lucene.apache.org/mahout/ > > > > > > > > On Oct 24, 2008, at 4:18 PM, Charlie Jackson wrote: > > > > During a recent sales pitch to my company by FAST, they mentioned entity > >> extraction. I'd never heard of it before, but they described it as > >> basically recognizing people/places/things in documents being indexed > >> and then being able to do faceting on this data at query time. Does > >> anything like this already exist in SOLR? If not, I'm not opposed to > >> developing it myself, but I could use some pointers on where to start. > >> > >> > >> > >> Thanks, > >> > >> - Charlie > >> > >> > > > > > -- > Regards, > > Rogério (_rogerio_) > > [Blog: http://faces.eti.br] [Sandbox: http://bmobile.dyndns.org] > [Twitter: > http://twitter.com/ararog] > > "Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento, > distribua e aprenda mais." > (http://faces.eti.br/2006/10/30/conhecimento-e-amadurecimento) >
Re: How to search a DataImportHandler solr index
oh. There is nothing wrong with indexing or querying. Solr cannot store or return a document like flash 50x50 100x100 gif 50x50 100x100 Solr/Lucene Document is not really an object tree. It is a flat object where the values can be a single valued or it can be a collection type But you can do something as follows have fields like size_flash, size_gif and size_jpg and depending on the banner type you can store them in appropriate fields BTW can be shortened to On Fri, Oct 24, 2008 at 6:48 PM, Nick80 <[EMAIL PROTECTED]> wrote: > > Hi, > > below is a simplified copy of my data-config file: > > > url="jdbc:mysql://localhost/campaign" user="root" password=""/> > >y > > > > > > > > > > > > > > I have defined the following fields in schema.xml: > > > > multiValued="true" omitNorms="true" termVectors="true" /> > multiValued="true" omitNorms="true" termVectors="true" /> > > Hope that makes it a bit clearer. Thanks. > > Kind regards, > > Nick > -- > View this message in context: > http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20149960.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul