Re: Development inside or outside of Solr?

2012-02-22 Thread bing
Hi, Erick, 

The example is impressive. Thank you. 

For the first, we decide not to do that, as Tika extraction is
time-consuming part in indexing large files, and the dual call make the
situation worse. 

For the second, for now, we choose Dspace to connect to DB, and
discovery(solr) as the index/query. Thus, we might do revisions in dspace. 

Best Regards, 
Bing 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3768977.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Development inside or outside of Solr?

2012-02-22 Thread bing
Hi, François Schiettecatte

Thank you for the reply all the same, but  I choose to stick on Solr
(wrapped with Tika language API) and do changes outside Solr. 

Best Regards, 
Bing 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3768903.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Development inside or outside of Solr?

2012-02-20 Thread Erick Erickson
Either is possible. For the first, you would write a custom update processor
that handled the dual Tika call...

For the second, consider writing a SolrJ program that just does it all on
the client. Just download Tika from the apache project (or tease out all
the jars from the Solr distro) and then make it all work on the client.

Here's a sample app:
http://www.lucidimagination.com/blog/2012/02/14/indexing-with-solrj/

Best
Erick

On Sun, Feb 19, 2012 at 9:44 PM, bing  wrote:
> Hi, all,
>
> I am deploying a multicore solr server runing on Tomcat, where I want to
> achieve language detection during index/query.
>
> Solr3.5.0 has a wrapped Tika API that can do language detection. Currently,
> the default behavior of Solr3.5.0 is, every time I index a document, and at
> mean time Solr call Tika API to give the result of language detection, i.e.
> index and detection happens at the same time. However, I hope I can have the
> language detection result first, and then I decide which core to put the
> document, i.e. detection happens before index.
>
> There seems that I need to do development in either of the following ways:
>
> 1. I might need to do revision of Solr itself, change the default behavior
> of Solr;
> 2. Or I might write a Java client outside Solr, call the client through
> server (JSP maybe) in index/query.
>
> Can anyone meeting with similar conditions give some suggestions about the
> advantages and disad of the two approaches? Any other alternatives? Thank
> you.
>
>
> Best
> Bing
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Development inside or outside of Solr?

2012-02-20 Thread François Schiettecatte
You could take a look at this:

http://www.let.rug.nl/vannoord/TextCat/

Will probably require some work to integrate/implement through

François

On Feb 20, 2012, at 3:37 AM, bing wrote:

> I have looked into the TikaCLI with -language option, and learned that Tika
> can output only the language metadata. It cannot help me to solve my problem
> though, as my main concern is whether to change Solr or not.  Thank you all
> the same. 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3760131.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Development inside or outside of Solr?

2012-02-20 Thread bing
I have looked into the TikaCLI with -language option, and learned that Tika
can output only the language metadata. It cannot help me to solve my problem
though, as my main concern is whether to change Solr or not.  Thank you all
the same. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3760131.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Development inside or outside of Solr?

2012-02-19 Thread Oleg Tikhonov
Hi,

I cannot say about two mentioned approaches however take a look at Tika CLI
with --language option.

Hope it helps,

Oleg


On Mon, Feb 20, 2012 at 4:44 AM, bing  wrote:

> Hi, all,
>
> I am deploying a multicore solr server runing on Tomcat, where I want to
> achieve language detection during index/query.
>
> Solr3.5.0 has a wrapped Tika API that can do language detection. Currently,
> the default behavior of Solr3.5.0 is, every time I index a document, and at
> mean time Solr call Tika API to give the result of language detection, i.e.
> index and detection happens at the same time. However, I hope I can have
> the
> language detection result first, and then I decide which core to put the
> document, i.e. detection happens before index.
>
> There seems that I need to do development in either of the following ways:
>
> 1. I might need to do revision of Solr itself, change the default behavior
> of Solr;
> 2. Or I might write a Java client outside Solr, call the client through
> server (JSP maybe) in index/query.
>
> Can anyone meeting with similar conditions give some suggestions about the
> advantages and disad of the two approaches? Any other alternatives? Thank
> you.
>
>
> Best
> Bing
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Development inside or outside of Solr?

2012-02-19 Thread bing
Hi, all, 

I am deploying a multicore solr server runing on Tomcat, where I want to
achieve language detection during index/query. 

Solr3.5.0 has a wrapped Tika API that can do language detection. Currently,
the default behavior of Solr3.5.0 is, every time I index a document, and at
mean time Solr call Tika API to give the result of language detection, i.e.
index and detection happens at the same time. However, I hope I can have the
language detection result first, and then I decide which core to put the
document, i.e. detection happens before index. 

There seems that I need to do development in either of the following ways:

1. I might need to do revision of Solr itself, change the default behavior
of Solr; 
2. Or I might write a Java client outside Solr, call the client through
server (JSP maybe) in index/query. 

Can anyone meeting with similar conditions give some suggestions about the
advantages and disad of the two approaches? Any other alternatives? Thank
you. 


Best 
Bing  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Development-inside-or-outside-of-Solr-tp3759680p3759680.html
Sent from the Solr - User mailing list archive at Nabble.com.