Re: Math Processing for Solr

2010-04-17 Thread Lance Norskog
Interesting topic. How would you do auto-suggest?

On 4/15/10, m...@gjgt.sk m...@gjgt.sk wrote:
 Thanks for the replies.

 I think I will leave this thing for now because I don't have much time and
 it doesn't look very easy to me at the moment. Maybe I'll save it to the
 future.

 Martin

 Payloads are used to set boosts for tokens.  Have a look at the
 PayloadTermQuery.  There is a patch for support in Solr, but it isn't
 committed yet.

 -Grant

 On Apr 15, 2010, at 8:46 AM, m...@gjgt.sk wrote:

 Yes, I considered creating own analyzer with a set of filters. Trouble
 is,
 that I wouldn't be able to set different boosts for the tokens created
 by
 the filters(filters need to create additional token to the input one and
 set a lower boost for it), which is kind of crucial funcionality. Even
 the
 tokenizer at the beginning of the process needs to set different boosts
 to
 different tokens produced. As far as I know, it is possible to set
 boosts
 only to Fields though.
 This is now more of a discussion for the Lucene lists, I guess.

 Thanks for the replies anyway.

 Martin

 (perhaps more appropriate on solr-user@)

 It sounds like you want to make a MathML filter?  Check out the
 analyzer packages...

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 simple example:
 https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java

 ryan


 2010/4/14  m...@gjgt.sk:
 Hello everybody,

 I'm new to all this so I hope this isn't too noob a question and that
 it
 isn't very inappropriate here.

 I'm currently working on a indexing/searching application based on
 Apache
 Lucene core, that can process mathematical formulae in MathML format
 (which is extension to XML) and store it in the index for searching.
 No
 troubles here, since I'm making everything above Lucene.

 But I started to think it would be nice to write this mathematical
 extension so it could be incorporated into Solr as easy as possible in
 the
 future. The thing is I looked into Solr's sources and I'm all confused
 to
 be honest and don't know which way to do this.

 Basic workflow of the whole math processing would be:
 Check the input document for any math-if found, mathematical unit
 needs
 to process it and produce many string-represented formulae with
 different
 boosts-put these into index not tokenized furthermore.

 That's about it.
 Any ideas? Any help will be appreciated.

 Thank you

 Martin






 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem using Solr/Lucene:
 http://www.lucidimagination.com/search






-- 
Lance Norskog
goks...@gmail.com


Re: Math Processing for Solr

2010-04-15 Thread Ryan McKinley
(perhaps more appropriate on solr-user@)

It sounds like you want to make a MathML filter?  Check out the
analyzer packages...

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

simple example:
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java

ryan


2010/4/14  m...@gjgt.sk:
 Hello everybody,

 I'm new to all this so I hope this isn't too noob a question and that it
 isn't very inappropriate here.

 I'm currently working on a indexing/searching application based on Apache
 Lucene core, that can process mathematical formulae in MathML format
 (which is extension to XML) and store it in the index for searching. No
 troubles here, since I'm making everything above Lucene.

 But I started to think it would be nice to write this mathematical
 extension so it could be incorporated into Solr as easy as possible in the
 future. The thing is I looked into Solr's sources and I'm all confused to
 be honest and don't know which way to do this.

 Basic workflow of the whole math processing would be:
 Check the input document for any math-if found, mathematical unit needs
 to process it and produce many string-represented formulae with different
 boosts-put these into index not tokenized furthermore.

 That's about it.
 Any ideas? Any help will be appreciated.

 Thank you

 Martin




Re: Math Processing for Solr

2010-04-15 Thread Grant Ingersoll
Another option would be a Tika handler that converted the MathML to a Solr 
document.

On Apr 15, 2010, at 2:38 AM, Ryan McKinley wrote:

 (perhaps more appropriate on solr-user@)
 
 It sounds like you want to make a MathML filter?  Check out the
 analyzer packages...
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
 
 simple example:
 https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java
 
 ryan
 
 
 2010/4/14  m...@gjgt.sk:
 Hello everybody,
 
 I'm new to all this so I hope this isn't too noob a question and that it
 isn't very inappropriate here.
 
 I'm currently working on a indexing/searching application based on Apache
 Lucene core, that can process mathematical formulae in MathML format
 (which is extension to XML) and store it in the index for searching. No
 troubles here, since I'm making everything above Lucene.
 
 But I started to think it would be nice to write this mathematical
 extension so it could be incorporated into Solr as easy as possible in the
 future. The thing is I looked into Solr's sources and I'm all confused to
 be honest and don't know which way to do this.
 
 Basic workflow of the whole math processing would be:
 Check the input document for any math-if found, mathematical unit needs
 to process it and produce many string-represented formulae with different
 boosts-put these into index not tokenized furthermore.
 
 That's about it.
 Any ideas? Any help will be appreciated.
 
 Thank you
 
 Martin
 
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Math Processing for Solr

2010-04-15 Thread mato
Yes, I considered creating own analyzer with a set of filters. Trouble is,
that I wouldn't be able to set different boosts for the tokens created by
the filters(filters need to create additional token to the input one and
set a lower boost for it), which is kind of crucial funcionality. Even the
tokenizer at the beginning of the process needs to set different boosts to
different tokens produced. As far as I know, it is possible to set boosts
only to Fields though.
This is now more of a discussion for the Lucene lists, I guess.

Thanks for the replies anyway.

Martin

 (perhaps more appropriate on solr-user@)

 It sounds like you want to make a MathML filter?  Check out the
 analyzer packages...

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 simple example:
 https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java

 ryan


 2010/4/14  m...@gjgt.sk:
 Hello everybody,

 I'm new to all this so I hope this isn't too noob a question and that it
 isn't very inappropriate here.

 I'm currently working on a indexing/searching application based on
 Apache
 Lucene core, that can process mathematical formulae in MathML format
 (which is extension to XML) and store it in the index for searching. No
 troubles here, since I'm making everything above Lucene.

 But I started to think it would be nice to write this mathematical
 extension so it could be incorporated into Solr as easy as possible in
 the
 future. The thing is I looked into Solr's sources and I'm all confused
 to
 be honest and don't know which way to do this.

 Basic workflow of the whole math processing would be:
 Check the input document for any math-if found, mathematical unit needs
 to process it and produce many string-represented formulae with
 different
 boosts-put these into index not tokenized furthermore.

 That's about it.
 Any ideas? Any help will be appreciated.

 Thank you

 Martin







Re: Math Processing for Solr

2010-04-15 Thread Grant Ingersoll
Payloads are used to set boosts for tokens.  Have a look at the 
PayloadTermQuery.  There is a patch for support in Solr, but it isn't committed 
yet.

-Grant

On Apr 15, 2010, at 8:46 AM, m...@gjgt.sk wrote:

 Yes, I considered creating own analyzer with a set of filters. Trouble is,
 that I wouldn't be able to set different boosts for the tokens created by
 the filters(filters need to create additional token to the input one and
 set a lower boost for it), which is kind of crucial funcionality. Even the
 tokenizer at the beginning of the process needs to set different boosts to
 different tokens produced. As far as I know, it is possible to set boosts
 only to Fields though.
 This is now more of a discussion for the Lucene lists, I guess.
 
 Thanks for the replies anyway.
 
 Martin
 
 (perhaps more appropriate on solr-user@)
 
 It sounds like you want to make a MathML filter?  Check out the
 analyzer packages...
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
 
 simple example:
 https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java
 
 ryan
 
 
 2010/4/14  m...@gjgt.sk:
 Hello everybody,
 
 I'm new to all this so I hope this isn't too noob a question and that it
 isn't very inappropriate here.
 
 I'm currently working on a indexing/searching application based on
 Apache
 Lucene core, that can process mathematical formulae in MathML format
 (which is extension to XML) and store it in the index for searching. No
 troubles here, since I'm making everything above Lucene.
 
 But I started to think it would be nice to write this mathematical
 extension so it could be incorporated into Solr as easy as possible in
 the
 future. The thing is I looked into Solr's sources and I'm all confused
 to
 be honest and don't know which way to do this.
 
 Basic workflow of the whole math processing would be:
 Check the input document for any math-if found, mathematical unit needs
 to process it and produce many string-represented formulae with
 different
 boosts-put these into index not tokenized furthermore.
 
 That's about it.
 Any ideas? Any help will be appreciated.
 
 Thank you
 
 Martin
 
 
 
 
 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: Math Processing for Solr

2010-04-15 Thread mato
Thanks for the replies.

I think I will leave this thing for now because I don't have much time and
it doesn't look very easy to me at the moment. Maybe I'll save it to the
future.

Martin

 Payloads are used to set boosts for tokens.  Have a look at the
 PayloadTermQuery.  There is a patch for support in Solr, but it isn't
 committed yet.

 -Grant

 On Apr 15, 2010, at 8:46 AM, m...@gjgt.sk wrote:

 Yes, I considered creating own analyzer with a set of filters. Trouble
 is,
 that I wouldn't be able to set different boosts for the tokens created
 by
 the filters(filters need to create additional token to the input one and
 set a lower boost for it), which is kind of crucial funcionality. Even
 the
 tokenizer at the beginning of the process needs to set different boosts
 to
 different tokens produced. As far as I know, it is possible to set
 boosts
 only to Fields though.
 This is now more of a discussion for the Lucene lists, I guess.

 Thanks for the replies anyway.

 Martin

 (perhaps more appropriate on solr-user@)

 It sounds like you want to make a MathML filter?  Check out the
 analyzer packages...

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 simple example:
 https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java

 ryan


 2010/4/14  m...@gjgt.sk:
 Hello everybody,

 I'm new to all this so I hope this isn't too noob a question and that
 it
 isn't very inappropriate here.

 I'm currently working on a indexing/searching application based on
 Apache
 Lucene core, that can process mathematical formulae in MathML format
 (which is extension to XML) and store it in the index for searching.
 No
 troubles here, since I'm making everything above Lucene.

 But I started to think it would be nice to write this mathematical
 extension so it could be incorporated into Solr as easy as possible in
 the
 future. The thing is I looked into Solr's sources and I'm all confused
 to
 be honest and don't know which way to do this.

 Basic workflow of the whole math processing would be:
 Check the input document for any math-if found, mathematical unit
 needs
 to process it and produce many string-represented formulae with
 different
 boosts-put these into index not tokenized furthermore.

 That's about it.
 Any ideas? Any help will be appreciated.

 Thank you

 Martin






 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem using Solr/Lucene:
 http://www.lucidimagination.com/search





Math Processing for Solr

2010-04-14 Thread mato
Hello everybody,

I'm new to all this so I hope this isn't too noob a question and that it
isn't very inappropriate here.

I'm currently working on a indexing/searching application based on Apache
Lucene core, that can process mathematical formulae in MathML format
(which is extension to XML) and store it in the index for searching. No
troubles here, since I'm making everything above Lucene.

But I started to think it would be nice to write this mathematical
extension so it could be incorporated into Solr as easy as possible in the
future. The thing is I looked into Solr's sources and I'm all confused to
be honest and don't know which way to do this.

Basic workflow of the whole math processing would be:
Check the input document for any math-if found, mathematical unit needs
to process it and produce many string-represented formulae with different
boosts-put these into index not tokenized furthermore.

That's about it.
Any ideas? Any help will be appreciated.

Thank you

Martin