Re: Math Processing for Solr
Interesting topic. How would you do auto-suggest? On 4/15/10, m...@gjgt.sk m...@gjgt.sk wrote: Thanks for the replies. I think I will leave this thing for now because I don't have much time and it doesn't look very easy to me at the moment. Maybe I'll save it to the future. Martin Payloads are used to set boosts for tokens. Have a look at the PayloadTermQuery. There is a patch for support in Solr, but it isn't committed yet. -Grant On Apr 15, 2010, at 8:46 AM, m...@gjgt.sk wrote: Yes, I considered creating own analyzer with a set of filters. Trouble is, that I wouldn't be able to set different boosts for the tokens created by the filters(filters need to create additional token to the input one and set a lower boost for it), which is kind of crucial funcionality. Even the tokenizer at the beginning of the process needs to set different boosts to different tokens produced. As far as I know, it is possible to set boosts only to Fields though. This is now more of a discussion for the Lucene lists, I guess. Thanks for the replies anyway. Martin (perhaps more appropriate on solr-user@) It sounds like you want to make a MathML filter? Check out the analyzer packages... http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters simple example: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java ryan 2010/4/14 m...@gjgt.sk: Hello everybody, I'm new to all this so I hope this isn't too noob a question and that it isn't very inappropriate here. I'm currently working on a indexing/searching application based on Apache Lucene core, that can process mathematical formulae in MathML format (which is extension to XML) and store it in the index for searching. No troubles here, since I'm making everything above Lucene. But I started to think it would be nice to write this mathematical extension so it could be incorporated into Solr as easy as possible in the future. The thing is I looked into Solr's sources and I'm all confused to be honest and don't know which way to do this. Basic workflow of the whole math processing would be: Check the input document for any math-if found, mathematical unit needs to process it and produce many string-represented formulae with different boosts-put these into index not tokenized furthermore. That's about it. Any ideas? Any help will be appreciated. Thank you Martin -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search -- Lance Norskog goks...@gmail.com
Re: Math Processing for Solr
(perhaps more appropriate on solr-user@) It sounds like you want to make a MathML filter? Check out the analyzer packages... http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters simple example: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java ryan 2010/4/14 m...@gjgt.sk: Hello everybody, I'm new to all this so I hope this isn't too noob a question and that it isn't very inappropriate here. I'm currently working on a indexing/searching application based on Apache Lucene core, that can process mathematical formulae in MathML format (which is extension to XML) and store it in the index for searching. No troubles here, since I'm making everything above Lucene. But I started to think it would be nice to write this mathematical extension so it could be incorporated into Solr as easy as possible in the future. The thing is I looked into Solr's sources and I'm all confused to be honest and don't know which way to do this. Basic workflow of the whole math processing would be: Check the input document for any math-if found, mathematical unit needs to process it and produce many string-represented formulae with different boosts-put these into index not tokenized furthermore. That's about it. Any ideas? Any help will be appreciated. Thank you Martin
Re: Math Processing for Solr
Another option would be a Tika handler that converted the MathML to a Solr document. On Apr 15, 2010, at 2:38 AM, Ryan McKinley wrote: (perhaps more appropriate on solr-user@) It sounds like you want to make a MathML filter? Check out the analyzer packages... http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters simple example: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java ryan 2010/4/14 m...@gjgt.sk: Hello everybody, I'm new to all this so I hope this isn't too noob a question and that it isn't very inappropriate here. I'm currently working on a indexing/searching application based on Apache Lucene core, that can process mathematical formulae in MathML format (which is extension to XML) and store it in the index for searching. No troubles here, since I'm making everything above Lucene. But I started to think it would be nice to write this mathematical extension so it could be incorporated into Solr as easy as possible in the future. The thing is I looked into Solr's sources and I'm all confused to be honest and don't know which way to do this. Basic workflow of the whole math processing would be: Check the input document for any math-if found, mathematical unit needs to process it and produce many string-represented formulae with different boosts-put these into index not tokenized furthermore. That's about it. Any ideas? Any help will be appreciated. Thank you Martin -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: Math Processing for Solr
Yes, I considered creating own analyzer with a set of filters. Trouble is, that I wouldn't be able to set different boosts for the tokens created by the filters(filters need to create additional token to the input one and set a lower boost for it), which is kind of crucial funcionality. Even the tokenizer at the beginning of the process needs to set different boosts to different tokens produced. As far as I know, it is possible to set boosts only to Fields though. This is now more of a discussion for the Lucene lists, I guess. Thanks for the replies anyway. Martin (perhaps more appropriate on solr-user@) It sounds like you want to make a MathML filter? Check out the analyzer packages... http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters simple example: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java ryan 2010/4/14 m...@gjgt.sk: Hello everybody, I'm new to all this so I hope this isn't too noob a question and that it isn't very inappropriate here. I'm currently working on a indexing/searching application based on Apache Lucene core, that can process mathematical formulae in MathML format (which is extension to XML) and store it in the index for searching. No troubles here, since I'm making everything above Lucene. But I started to think it would be nice to write this mathematical extension so it could be incorporated into Solr as easy as possible in the future. The thing is I looked into Solr's sources and I'm all confused to be honest and don't know which way to do this. Basic workflow of the whole math processing would be: Check the input document for any math-if found, mathematical unit needs to process it and produce many string-represented formulae with different boosts-put these into index not tokenized furthermore. That's about it. Any ideas? Any help will be appreciated. Thank you Martin
Re: Math Processing for Solr
Payloads are used to set boosts for tokens. Have a look at the PayloadTermQuery. There is a patch for support in Solr, but it isn't committed yet. -Grant On Apr 15, 2010, at 8:46 AM, m...@gjgt.sk wrote: Yes, I considered creating own analyzer with a set of filters. Trouble is, that I wouldn't be able to set different boosts for the tokens created by the filters(filters need to create additional token to the input one and set a lower boost for it), which is kind of crucial funcionality. Even the tokenizer at the beginning of the process needs to set different boosts to different tokens produced. As far as I know, it is possible to set boosts only to Fields though. This is now more of a discussion for the Lucene lists, I guess. Thanks for the replies anyway. Martin (perhaps more appropriate on solr-user@) It sounds like you want to make a MathML filter? Check out the analyzer packages... http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters simple example: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java ryan 2010/4/14 m...@gjgt.sk: Hello everybody, I'm new to all this so I hope this isn't too noob a question and that it isn't very inappropriate here. I'm currently working on a indexing/searching application based on Apache Lucene core, that can process mathematical formulae in MathML format (which is extension to XML) and store it in the index for searching. No troubles here, since I'm making everything above Lucene. But I started to think it would be nice to write this mathematical extension so it could be incorporated into Solr as easy as possible in the future. The thing is I looked into Solr's sources and I'm all confused to be honest and don't know which way to do this. Basic workflow of the whole math processing would be: Check the input document for any math-if found, mathematical unit needs to process it and produce many string-represented formulae with different boosts-put these into index not tokenized furthermore. That's about it. Any ideas? Any help will be appreciated. Thank you Martin -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: Math Processing for Solr
Thanks for the replies. I think I will leave this thing for now because I don't have much time and it doesn't look very easy to me at the moment. Maybe I'll save it to the future. Martin Payloads are used to set boosts for tokens. Have a look at the PayloadTermQuery. There is a patch for support in Solr, but it isn't committed yet. -Grant On Apr 15, 2010, at 8:46 AM, m...@gjgt.sk wrote: Yes, I considered creating own analyzer with a set of filters. Trouble is, that I wouldn't be able to set different boosts for the tokens created by the filters(filters need to create additional token to the input one and set a lower boost for it), which is kind of crucial funcionality. Even the tokenizer at the beginning of the process needs to set different boosts to different tokens produced. As far as I know, it is possible to set boosts only to Fields though. This is now more of a discussion for the Lucene lists, I guess. Thanks for the replies anyway. Martin (perhaps more appropriate on solr-user@) It sounds like you want to make a MathML filter? Check out the analyzer packages... http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters simple example: https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/LengthFilterFactory.java ryan 2010/4/14 m...@gjgt.sk: Hello everybody, I'm new to all this so I hope this isn't too noob a question and that it isn't very inappropriate here. I'm currently working on a indexing/searching application based on Apache Lucene core, that can process mathematical formulae in MathML format (which is extension to XML) and store it in the index for searching. No troubles here, since I'm making everything above Lucene. But I started to think it would be nice to write this mathematical extension so it could be incorporated into Solr as easy as possible in the future. The thing is I looked into Solr's sources and I'm all confused to be honest and don't know which way to do this. Basic workflow of the whole math processing would be: Check the input document for any math-if found, mathematical unit needs to process it and produce many string-represented formulae with different boosts-put these into index not tokenized furthermore. That's about it. Any ideas? Any help will be appreciated. Thank you Martin -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Math Processing for Solr
Hello everybody, I'm new to all this so I hope this isn't too noob a question and that it isn't very inappropriate here. I'm currently working on a indexing/searching application based on Apache Lucene core, that can process mathematical formulae in MathML format (which is extension to XML) and store it in the index for searching. No troubles here, since I'm making everything above Lucene. But I started to think it would be nice to write this mathematical extension so it could be incorporated into Solr as easy as possible in the future. The thing is I looked into Solr's sources and I'm all confused to be honest and don't know which way to do this. Basic workflow of the whole math processing would be: Check the input document for any math-if found, mathematical unit needs to process it and produce many string-represented formulae with different boosts-put these into index not tokenized furthermore. That's about it. Any ideas? Any help will be appreciated. Thank you Martin