Re: Translation API question

2015-05-06 Thread Mattmann, Chris A (3980)
One thought I had about this was a TranslatingHandler and/or
a LanguageHandler. That IMO may be the best way to do Language
detection and/or translation in general since that way we could
just easily plug into the output of the existing Parsers, etc.

Else I was thinking about creating a ParserDetector class for
TranslatingParserDecorator and/or LanguageParserDetector to
expose both pieces of information.

Thoughts?

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Tyler Palsulich 
Reply-To: "dev@tika.apache.org" 
Date: Tuesday, May 5, 2015 at 1:21 PM
To: "dev@tika.apache.org" 
Subject: Re: Translation API question

>Hi Sergey,
>
>Unfortunately, not yet. See TIKA-1328.
>
>Tyler
>
>On Tue, May 5, 2015 at 4:51 PM, Sergey Beryozkin 
>wrote:
>
>> Hi All
>>
>> Is it possible to submit a document to the Translation API and get the
>> translated words as a sequence of events ? For example, with a regular
>>Tika
>> API it is possible to submit a document and get the metadata and the
>>data,
>> and these data can be indexed, etc.
>>
>> What about submitting a document (for ex, French) to the translation API
>> and getting a list of the words in English, so that they can be indexed.
>>
>> I'm thinking, may be one then can use a query to find all the documents
>>in
>> French that contain a given word as it reads in English. Example: find a
>> French doc containing "thanks", etc...
>>
>> Not sure how much sense it makes though :-)
>>
>> Cheers, Sergey
>>



Re: Translation API question

2015-05-06 Thread Sergey Beryozkin

Hi Tyler, thanks, I'll watch it

Cheers, Sergey
On 06/05/15 00:21, Tyler Palsulich wrote:

Hi Sergey,

Unfortunately, not yet. See TIKA-1328.

Tyler

On Tue, May 5, 2015 at 4:51 PM, Sergey Beryozkin 
wrote:


Hi All

Is it possible to submit a document to the Translation API and get the
translated words as a sequence of events ? For example, with a regular Tika
API it is possible to submit a document and get the metadata and the data,
and these data can be indexed, etc.

What about submitting a document (for ex, French) to the translation API
and getting a list of the words in English, so that they can be indexed.

I'm thinking, may be one then can use a query to find all the documents in
French that contain a given word as it reads in English. Example: find a
French doc containing "thanks", etc...

Not sure how much sense it makes though :-)

Cheers, Sergey







Re: Translation API question

2015-05-05 Thread Tyler Palsulich
Hi Sergey,

Unfortunately, not yet. See TIKA-1328.

Tyler

On Tue, May 5, 2015 at 4:51 PM, Sergey Beryozkin 
wrote:

> Hi All
>
> Is it possible to submit a document to the Translation API and get the
> translated words as a sequence of events ? For example, with a regular Tika
> API it is possible to submit a document and get the metadata and the data,
> and these data can be indexed, etc.
>
> What about submitting a document (for ex, French) to the translation API
> and getting a list of the words in English, so that they can be indexed.
>
> I'm thinking, may be one then can use a query to find all the documents in
> French that contain a given word as it reads in English. Example: find a
> French doc containing "thanks", etc...
>
> Not sure how much sense it makes though :-)
>
> Cheers, Sergey
>


Translation API question

2015-05-05 Thread Sergey Beryozkin

Hi All

Is it possible to submit a document to the Translation API and get the 
translated words as a sequence of events ? For example, with a regular 
Tika API it is possible to submit a document and get the metadata and 
the data, and these data can be indexed, etc.


What about submitting a document (for ex, French) to the translation API 
 and getting a list of the words in English, so that they can be indexed.


I'm thinking, may be one then can use a query to find all the documents 
in French that contain a given word as it reads in English. Example: 
find a French doc containing "thanks", etc...


Not sure how much sense it makes though :-)

Cheers, Sergey