Re: Development inside or outside of Solr?
Hi, Erick, The example is impressive. Thank you. For the first, we decide not to do that, as Tika extraction is time-consuming part in indexing large files, and the dual call make the situation worse. For the second, for now, we choose Dspace to connect to DB, and discovery(solr) as the index/query. Thus, we might do revisions in dspace. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3768977.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Development inside or outside of Solr?
Hi, François Schiettecatte Thank you for the reply all the same, but I choose to stick on Solr (wrapped with Tika language API) and do changes outside Solr. Best Regards, Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3768903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Development inside or outside of Solr?
Either is possible. For the first, you would write a custom update processor that handled the dual Tika call... For the second, consider writing a SolrJ program that just does it all on the client. Just download Tika from the apache project (or tease out all the jars from the Solr distro) and then make it all work on the client. Here's a sample app: http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/ Best Erick On Sun, Feb 19, 2012 at 9:44 PM, bing wrote: > Hi, all, > > I am deploying a multicore solr server runing on Tomcat, where I want to > achieve language detection during index/query. > > Solr3.5.0 has a wrapped Tika API that can do language detection. Currently, > the default behavior of Solr3.5.0 is, every time I index a document, and at > mean time Solr call Tika API to give the result of language detection, i.e. > index and detection happens at the same time. However, I hope I can have the > language detection result first, and then I decide which core to put the > document, i.e. detection happens before index. > > There seems that I need to do development in either of the following ways: > > 1. I might need to do revision of Solr itself, change the default behavior > of Solr; > 2. Or I might write a Java client outside Solr, call the client through > server (JSP maybe) in index/query. > > Can anyone meeting with similar conditions give some suggestions about the > advantages and disad of the two approaches? Any other alternatives? Thank > you. > > > Best > Bing > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Development inside or outside of Solr?
You could take a look at this: http://www.let.rug.nl/vannoord/TextCat/ Will probably require some work to integrate/implement through François On Feb 20, 2012, at 3:37 AM, bing wrote: > I have looked into the TikaCLI with -language option, and learned that Tika > can output only the language metadata. It cannot help me to solve my problem > though, as my main concern is whether to change Solr or not. Thank you all > the same. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3760131.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Development inside or outside of Solr?
I have looked into the TikaCLI with -language option, and learned that Tika can output only the language metadata. It cannot help me to solve my problem though, as my main concern is whether to change Solr or not. Thank you all the same. -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3760131.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Development inside or outside of Solr?
Hi, I cannot say about two mentioned approaches however take a look at Tika CLI with --language option. Hope it helps, Oleg On Mon, Feb 20, 2012 at 4:44 AM, bing wrote: > Hi, all, > > I am deploying a multicore solr server runing on Tomcat, where I want to > achieve language detection during index/query. > > Solr3.5.0 has a wrapped Tika API that can do language detection. Currently, > the default behavior of Solr3.5.0 is, every time I index a document, and at > mean time Solr call Tika API to give the result of language detection, i.e. > index and detection happens at the same time. However, I hope I can have > the > language detection result first, and then I decide which core to put the > document, i.e. detection happens before index. > > There seems that I need to do development in either of the following ways: > > 1. I might need to do revision of Solr itself, change the default behavior > of Solr; > 2. Or I might write a Java client outside Solr, call the client through > server (JSP maybe) in index/query. > > Can anyone meeting with similar conditions give some suggestions about the > advantages and disad of the two approaches? Any other alternatives? Thank > you. > > > Best > Bing > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Development inside or outside of Solr?
Hi, all, I am deploying a multicore solr server runing on Tomcat, where I want to achieve language detection during index/query. Solr3.5.0 has a wrapped Tika API that can do language detection. Currently, the default behavior of Solr3.5.0 is, every time I index a document, and at mean time Solr call Tika API to give the result of language detection, i.e. index and detection happens at the same time. However, I hope I can have the language detection result first, and then I decide which core to put the document, i.e. detection happens before index. There seems that I need to do development in either of the following ways: 1. I might need to do revision of Solr itself, change the default behavior of Solr; 2. Or I might write a Java client outside Solr, call the client through server (JSP maybe) in index/query. Can anyone meeting with similar conditions give some suggestions about the advantages and disad of the two approaches? Any other alternatives? Thank you. Best Bing -- View this message in context: http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html Sent from the Solr - User mailing list archive at Nabble.com.