Re: Entity extraction?

2008-10-25 Thread Vaijanath N. Rao

Hi,

One can use the OpenNLP Max entropy library and create there own 
named-entity extraction.

I had used it in one of the projects which I did with Solr.

It is easy to integrate most of the NLP libraries with Solr. Though we 
had named-entity extraction embedded in our crawler which would populate 
a field called entities in the database, which we would ingest in Solr 
as yet another field.


--Thanks and Regards
Vaijanath N. Rao

Julien Nioche wrote:

Hi,

Open Source NLP platforms like GATE (http://gate.ac.uk) or Apache UIMA are
typically used for these types of tasks. GATE in particular comes with an
application called ANNIE which does Named Entity Recognition. OpenCalais
does that as well and should be easy to embed, but it can't be tuned to do
more specific things unlike UIMA or GATE based applications.

Depending on the architecture you have in mind it could be worth
investigating Nutch and add the NER as a custom plugin; NLP being often a
CPU intensive task you could leverage the scalability of Hadoop in Nutch.
There is a patch which allows to delegate the indexing to SOLR. As someone
else already said these named entities could then be used as facets.

HTH

Julien
  




Re: Query problem related to * symbol

2008-10-25 Thread Yonik Seeley
On Sat, Oct 25, 2008 at 2:00 PM, Aleksey Gogolev <[EMAIL PROTECTED]> wrote:
> I made this query:
> http://localhost:8983/solr/select/?q=suggestion:ipod+nano+80*

Note that in Lucene syntax, this query is equivalent to
suggestion:ipod default_field:nano default_field:80*

For debugging, add debugQuery=true to your request to see what the
parsed query looks like.

-Yonik


Query problem related to * symbol

2008-10-25 Thread Aleksey Gogolev

Hello.

I made this query:
http://localhost:8983/solr/select/?q=suggestion:ipod+nano+80*

and response contains the following doc:
-

 
  04adea06fcfdc939feec63799045076c
 
 
  apple ma045 for ipod 80gb nano
 
 
  2008-10-25T16:50:48.703Z
 

-

Then I made this query (the "g" letter is added):
http://localhost:8983/solr/select/?q=suggestion:ipod+nano+80g*
and I  expect to see the same doc in response, but response was empty.

In first moment I thought that this strange behaviour is caused by
SynonymFilter, but I checked the type of field "suggestion", and it is
quite simple, and the filter chain doesn't contain SynonymFilter:

--













--

Any ideas about reasons of this strange behaviour?

-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey



Re: Lucene project & subprojects news RSS feed?

2008-10-25 Thread Grant Ingersoll

I don't believe there is one, but a patch would be welcome to add one.

On Oct 24, 2008, at 6:46 PM, David Smiley @MITRE.org wrote:



On the main lucene web page: http://lucene.apache.org/index.html
There is a list of news items spanning all the lucene subprojects.   
Does
anyone know if there is an RSS feed or "announce" mailing list that  
has this

information?

~ David Smiley
--
View this message in context: 
http://www.nabble.com/Lucene-project---subprojects-news-RSS-feed--tp20158991p20158991.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Re: customizing results in StandardQueryHandler

2008-10-25 Thread Chris Hostetter

: Subject: customizing results in StandardQueryHandler
: In-Reply-To: <[EMAIL PROTECTED]>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking


-Hoss



Re: Entity extraction?

2008-10-25 Thread Julien Nioche
Hi,

Open Source NLP platforms like GATE (http://gate.ac.uk) or Apache UIMA are
typically used for these types of tasks. GATE in particular comes with an
application called ANNIE which does Named Entity Recognition. OpenCalais
does that as well and should be easy to embed, but it can't be tuned to do
more specific things unlike UIMA or GATE based applications.

Depending on the architecture you have in mind it could be worth
investigating Nutch and add the NER as a custom plugin; NLP being often a
CPU intensive task you could leverage the scalability of Hadoop in Nutch.
There is a patch which allows to delegate the indexing to SOLR. As someone
else already said these named entities could then be used as facets.

HTH

Julien
-- 
DigitalPebble Ltd
http://www.digitalpebble.com

2008/10/24 Rogerio Pereira <[EMAIL PROTECTED]>

> I agree Ryan and I would like see a completly integration between solr,
> nutch, tika and mahout in the future.
>
> 2008/10/24 Ryan McKinley <[EMAIL PROTECTED]>
>
> > This is not something solr does currently...
> >
> > It sounds like something that should be added to Mahout:
> > http://lucene.apache.org/mahout/
> >
> >
> >
> > On Oct 24, 2008, at 4:18 PM, Charlie Jackson wrote:
> >
> >  During a recent sales pitch to my company by FAST, they mentioned entity
> >> extraction. I'd never heard of it before, but they described it as
> >> basically recognizing people/places/things in documents being indexed
> >> and then being able to do faceting on this data at query time. Does
> >> anything like this already exist in SOLR? If not, I'm not opposed to
> >> developing it myself, but I could use some pointers on where to start.
> >>
> >>
> >>
> >> Thanks,
> >>
> >> - Charlie
> >>
> >>
> >
>
>
> --
> Regards,
>
> Rogério (_rogerio_)
>
> [Blog: http://faces.eti.br]  [Sandbox: http://bmobile.dyndns.org]
>  [Twitter:
> http://twitter.com/ararog]
>
> "Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento,
> distribua e aprenda mais."
> (http://faces.eti.br/2006/10/30/conhecimento-e-amadurecimento)
>


Re: How to search a DataImportHandler solr index

2008-10-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
oh. There is nothing wrong with indexing or querying.
Solr cannot store or return a document like

 
flash
 
50x50
100x100


gif

50x50
100x100


 

Solr/Lucene Document is not really an object tree. It is a flat object where
the values can be a single valued or it can be a collection type

But you can do something as follows

have fields like size_flash, size_gif and size_jpg and depending on the
banner type you can store them in appropriate fields

BTW
 
can be shortened to
 



On Fri, Oct 24, 2008 at 6:48 PM, Nick80 <[EMAIL PROTECTED]> wrote:

>
> Hi,
>
> below is a simplified copy of my data-config file:
>
> 
>  url="jdbc:mysql://localhost/campaign" user="root" password=""/>
>
>y
>
>  
>
>  
> 
> 
> 
>  
>  
>
> 
> 
>
> I have defined the following fields in schema.xml:
>
> 
> 
>  multiValued="true" omitNorms="true" termVectors="true" />
>  multiValued="true" omitNorms="true" termVectors="true" />
>
> Hope that makes it a bit clearer. Thanks.
>
> Kind regards,
>
> Nick
> --
> View this message in context:
> http://www.nabble.com/How-to-search-a-DataImportHandler-solr-index-tp20120698p20149960.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
--Noble Paul