Re: Translation API question
One thought I had about this was a TranslatingHandler and/or a LanguageHandler. That IMO may be the best way to do Language detection and/or translation in general since that way we could just easily plug into the output of the existing Parsers, etc. Else I was thinking about creating a ParserDetector class for TranslatingParserDecorator and/or LanguageParserDetector to expose both pieces of information. Thoughts? ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Tyler Palsulich Reply-To: "dev@tika.apache.org" Date: Tuesday, May 5, 2015 at 1:21 PM To: "dev@tika.apache.org" Subject: Re: Translation API question >Hi Sergey, > >Unfortunately, not yet. See TIKA-1328. > >Tyler > >On Tue, May 5, 2015 at 4:51 PM, Sergey Beryozkin >wrote: > >> Hi All >> >> Is it possible to submit a document to the Translation API and get the >> translated words as a sequence of events ? For example, with a regular >>Tika >> API it is possible to submit a document and get the metadata and the >>data, >> and these data can be indexed, etc. >> >> What about submitting a document (for ex, French) to the translation API >> and getting a list of the words in English, so that they can be indexed. >> >> I'm thinking, may be one then can use a query to find all the documents >>in >> French that contain a given word as it reads in English. Example: find a >> French doc containing "thanks", etc... >> >> Not sure how much sense it makes though :-) >> >> Cheers, Sergey >>
Re: Translation API question
Hi Tyler, thanks, I'll watch it Cheers, Sergey On 06/05/15 00:21, Tyler Palsulich wrote: Hi Sergey, Unfortunately, not yet. See TIKA-1328. Tyler On Tue, May 5, 2015 at 4:51 PM, Sergey Beryozkin wrote: Hi All Is it possible to submit a document to the Translation API and get the translated words as a sequence of events ? For example, with a regular Tika API it is possible to submit a document and get the metadata and the data, and these data can be indexed, etc. What about submitting a document (for ex, French) to the translation API and getting a list of the words in English, so that they can be indexed. I'm thinking, may be one then can use a query to find all the documents in French that contain a given word as it reads in English. Example: find a French doc containing "thanks", etc... Not sure how much sense it makes though :-) Cheers, Sergey
Re: Translation API question
Hi Sergey, Unfortunately, not yet. See TIKA-1328. Tyler On Tue, May 5, 2015 at 4:51 PM, Sergey Beryozkin wrote: > Hi All > > Is it possible to submit a document to the Translation API and get the > translated words as a sequence of events ? For example, with a regular Tika > API it is possible to submit a document and get the metadata and the data, > and these data can be indexed, etc. > > What about submitting a document (for ex, French) to the translation API > and getting a list of the words in English, so that they can be indexed. > > I'm thinking, may be one then can use a query to find all the documents in > French that contain a given word as it reads in English. Example: find a > French doc containing "thanks", etc... > > Not sure how much sense it makes though :-) > > Cheers, Sergey >
Translation API question
Hi All Is it possible to submit a document to the Translation API and get the translated words as a sequence of events ? For example, with a regular Tika API it is possible to submit a document and get the metadata and the data, and these data can be indexed, etc. What about submitting a document (for ex, French) to the translation API and getting a list of the words in English, so that they can be indexed. I'm thinking, may be one then can use a query to find all the documents in French that contain a given word as it reads in English. Example: find a French doc containing "thanks", etc... Not sure how much sense it makes though :-) Cheers, Sergey