Re: FW: Apache Tika used to parse the Panama papers!

2016-04-06 Thread Tilman Hausherr

Yes I read about that too :-)

It would be interesting to hear whether they had any problems, and 
whether they made any support requests, and were these answered 
successfully? Were there any files that failed or did poorly? Or was 
everything so good that no help was needed at all?


I'm delighted that a java product was used, despite that native code 
products would likely have been faster.


Tilman (I'm slightly skeptic about the ICIJ because of the funding and 
the suspicious lack of US data, but as a huge data archeology project, I 
love it!)


Am 06.04.2016 um 19:18 schrieb Allison, Timothy B.:

Looks like quite a few PDFs [0]...

Couldn't have done it without you!

Cheers,

Tim

P.S. Tip of the hat to Andreas for rt the link!

[0] https://twitter.com/bigdata/status/717346207312392192

-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
Sent: Tuesday, April 05, 2016 6:47 PM
To: dev@tika.apache.org
Cc: pr...@apache.org
Subject: Apache Tika used to parse the Panama papers!

FYI:
http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5


BTW I know Thomas and am in touch..he wrote an article about MEMEX last year.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion 
Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++






-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org





FW: Apache Tika used to parse the Panama papers!

2016-04-06 Thread Allison, Timothy B.
Looks like quite a few MSG files!


-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Tuesday, April 05, 2016 6:47 PM
To: dev@tika.apache.org
Cc: pr...@apache.org
Subject: Apache Tika used to parse the Panama papers!

FYI:
http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5


BTW I know Thomas and am in touch..he wrote an article about MEMEX last year.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion 
Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++







FW: Apache Tika used to parse the Panama papers!

2016-04-06 Thread Allison, Timothy B.
Looks like quite a few PDFs [0]...

Couldn't have done it without you! 

Cheers,

   Tim

P.S. Tip of the hat to Andreas for rt the link!

[0] https://twitter.com/bigdata/status/717346207312392192 

-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Tuesday, April 05, 2016 6:47 PM
To: dev@tika.apache.org
Cc: pr...@apache.org
Subject: Apache Tika used to parse the Panama papers!

FYI:
http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5


BTW I know Thomas and am in touch..he wrote an article about MEMEX last year.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion 
Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++







RE: Apache Tika used to parse the Panama papers!

2016-04-06 Thread Vasu Jain
Looks like someone took USC- CS572's assignments to a new level. ;)
> Date: Wed, 6 Apr 2016 09:28:49 +0200
> Subject: Re: Apache Tika used to parse the Panama papers!
> From: bdelacre...@apache.org
> To: chris.a.mattm...@jpl.nasa.gov; pr...@apache.org
> CC: dev@tika.apache.org
> 
> Hi,
> 
> On Wed, Apr 6, 2016 at 12:46 AM, Mattmann, Chris A (3980)
> 
> > http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak
> 
> Note that this also mentions Apache Solr.
> 
> -Bertrand
  

Re: Apache Tika used to parse the Panama papers!

2016-04-06 Thread Bertrand Delacretaz
Hi,

On Wed, Apr 6, 2016 at 12:46 AM, Mattmann, Chris A (3980)

> http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak

Note that this also mentions Apache Solr.

-Bertrand


Apache Tika used to parse the Panama papers!

2016-04-05 Thread Mattmann, Chris A (3980)
FYI:
http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech_source=TWITTER_medium=social_channel=Technology=23087770#709893771df5


BTW I know Thomas and am in touch..he wrote an article about MEMEX
last year.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++