Re: FW: Apache Tika used to parse the Panama papers!
Yes I read about that too :-) It would be interesting to hear whether they had any problems, and whether they made any support requests, and were these answered successfully? Were there any files that failed or did poorly? Or was everything so good that no help was needed at all? I'm delighted that a java product was used, despite that native code products would likely have been faster. Tilman (I'm slightly skeptic about the ICIJ because of the funding and the suspicious lack of US data, but as a huge data archeology project, I love it!) Am 06.04.2016 um 19:18 schrieb Allison, Timothy B.: Looks like quite a few PDFs [0]... Couldn't have done it without you! Cheers, Tim P.S. Tip of the hat to Andreas for rt the link! [0] https://twitter.com/bigdata/status/717346207312392192 -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Tuesday, April 05, 2016 6:47 PM To: dev@tika.apache.org Cc: pr...@apache.org Subject: Apache Tika used to parse the Panama papers! FYI: http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5 BTW I know Thomas and am in touch..he wrote an article about MEMEX last year. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++ - To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org
FW: Apache Tika used to parse the Panama papers!
Looks like quite a few MSG files! -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Tuesday, April 05, 2016 6:47 PM To: dev@tika.apache.org Cc: pr...@apache.org Subject: Apache Tika used to parse the Panama papers! FYI: http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5 BTW I know Thomas and am in touch..he wrote an article about MEMEX last year. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++
FW: Apache Tika used to parse the Panama papers!
Looks like quite a few PDFs [0]... Couldn't have done it without you! Cheers, Tim P.S. Tip of the hat to Andreas for rt the link! [0] https://twitter.com/bigdata/status/717346207312392192 -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Tuesday, April 05, 2016 6:47 PM To: dev@tika.apache.org Cc: pr...@apache.org Subject: Apache Tika used to parse the Panama papers! FYI: http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5 BTW I know Thomas and am in touch..he wrote an article about MEMEX last year. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++
RE: Apache Tika used to parse the Panama papers!
Looks like someone took USC- CS572's assignments to a new level. ;) > Date: Wed, 6 Apr 2016 09:28:49 +0200 > Subject: Re: Apache Tika used to parse the Panama papers! > From: bdelacre...@apache.org > To: chris.a.mattm...@jpl.nasa.gov; pr...@apache.org > CC: dev@tika.apache.org > > Hi, > > On Wed, Apr 6, 2016 at 12:46 AM, Mattmann, Chris A (3980) > > > http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak > > Note that this also mentions Apache Solr. > > -Bertrand
Re: Apache Tika used to parse the Panama papers!
Hi, On Wed, Apr 6, 2016 at 12:46 AM, Mattmann, Chris A (3980) > http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak Note that this also mentions Apache Solr. -Bertrand
Apache Tika used to parse the Panama papers!
FYI: http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5 BTW I know Thomas and am in touch..he wrote an article about MEMEX last year. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++