Let's move the conversation to the existing and still open ticket: 
https://issues.apache.org/jira/browse/TIKA-1334

:)

I'm really excited about this!


-----Original Message-----
From: Tyler Bui-Palsulich [mailto:tpalsul...@apache.org] 
Sent: Tuesday, May 2, 2017 7:20 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15

Thanks for the link. It looks like the UI is written with Angular and uses 
Elastic + static JSON. See 
https://github.com/USCDataScience/polar-deep-insights/wiki/Architecture.

I also like d3. In general, I think we are on the same page the best option is 
a web based UI.

I see a few options to get data into the frontend:
1. Static JSON
2. JSON from a server (meaning the server runs queries (either built by the 
client or the server)) 3. Load a local DB (meaning the client runs queries)

From some quick searching, 3 seems like it has poor support. I could be wrong.

1 and 2 are clearly related. If we have a working application with static JSON, 
changing it to use served JSON should be straightforward (from a Java server, 
probably). Static JSON will be faster than live queries, but I don't know how 
long the queries take. The polar project seems to hard code queries and provide 
an interface to manually enter more.

Static JSON seems easiest to get started. What do you think?

Tyler

On May 2, 2017 6:57 AM, "Chris Mattmann" <mattm...@apache.org> wrote:

> Team, check out Polar Insights, which my USC IRDS student NIthin did:
>
> http://polar.usc.edu/html/polar-deep-insights/index.html#/config
>
> Click Download, then Download (the 2 download buttons), then Save, 
> then click the Query Interface. Something like this?
>
> All code is OSS on 
> http://github.com/USCDataScience/polar-deep-insights/
>
> Cheers,
> Chris
>
>
> On 5/2/17, 4:54 AM, "Allison, Timothy B." <talli...@mitre.org> wrote:
>
>     Y.  It is daunting at this point, and please do help!
>
>     The key sheets I look at:
>
>     exceptions/exceptions_compared_by_mime_type.xlsx
>     exceptions/new_exceptions_in_B_by_mime.xlsx
>
>     mimes/mime_diffs_A_to_B.xlsx
>
>     attachments/attachment_diffs.xlsx
>
>     metadata/metadata_value_count_diffs.xlsx
>
>     I can dump json, but wouldn't it be easier for you to pull 
> directly from the db?
>
>     My vision is to put a gui on the db that would allow you to 
> visualize the reports/see the data and have links to the original 
> (binary) files plus the extract files for both A and B (perhaps with a diff 
> visualization).
>
>     Three cheers for d3.
>
>
>     -----Original Message-----
>     From: Tyler Bui-Palsulich [mailto:tpalsul...@apache.org]
>     Sent: Monday, May 1, 2017 11:39 PM
>     To: dev@tika.apache.org
>     Subject: RE: Tika 1.15
>
>     How exactly did you "evaluate" the results? I opened the zip and 
> looked at a few of the sheets, but it's a bit daunting.
>
>     Any way we could dump JSON? That's a bit easier to build 
> visualizations for.
>
>     Tyler
>
>     On May 1, 2017 3:59 PM, "Allison, Timothy B." <talli...@mitre.org>
> wrote:
>
>     > Sounds good.  W00t!
>     >
>     > -----Original Message-----
>     > From: Chris Mattmann [mailto:mattm...@apache.org]
>     > Sent: Monday, May 1, 2017 4:57 PM
>     > To: dev@tika.apache.org
>     > Subject: Re: Tika 1.15
>     >
>     > Thanks Tim. I am going to try and get tika-dl added (if 
> possible), and
>     > also try the Sentiment Parser next. If I can get one or both of those
>     > (in the next day or so), then I will give you the heads up to 
> begin testing.
>     > Video recognition is in!
>     >
>     >
>     >
>     >
>     >
>     > On 5/1/17, 12:42 PM, "Allison, Timothy B." <talli...@mitre.org>
> wrote:
>     >
>     >     I finally had a chance to look through the results of the first
>     > regression run.
>     >
>     >     I made a few trivial changes to our parsers and to tika-eval.
>     >
>     >     We appear to have many more exceptions in files parsed by our
>     > CompressorParser, but this is because of reporting...not because of
>     > reality
>     > -- the exception is now coming in the container file, not an
>     > attachment...and tika-eval wasn't matching A and B correctly.
>     >
>     >     There is a regression that's been fixed in PDFBox trunk
>     > (PDFBOX-3717), but I don't see that as a blocker.
>     >
>     >     We have new exceptions in the new parsers, EMF, WMF, .xlsb,
>     > wordperfect, but that's because we're actually parsing those now. :)
>     >
>     >     All else looks to be in decent shape.
>     >
>     >     Chris and Team and All,
>     >       Let me know when you're ready for me to kick off the next
>     > regression run.
>     >
>     >               Cheers,
>     >
>     >                       Tim
>     >
>     >
>     >
>     >
>     >     -----Original Message-----
>     >     From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.
> nasa.gov]
>     >     Sent: Wednesday, April 26, 2017 12:48 PM
>     >     To: dev@tika.apache.org
>     >     Subject: Re: Tika 1.15
>     >
>     >     Thank you!
>     >
>     >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>     > ++++++++++++++
>     >     Chris Mattmann, Ph.D.
>     >     Principal Data Scientist, Engineering Administrative Office
> (3010)
>     > Manager, NSF & Open Source Projects Formulation and Development
>     > Offices
>     > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>     >     Office: 180-503E, Mailstop: 180-503
>     >     Email: chris.a.mattm...@nasa.gov
>     >     WWW:  http://sunset.usc.edu/~mattmann/
>     >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>     > ++++++++++++++
>     >     Director, Information Retrieval and Data Science Group (IRDS)
>     > Adjunct Associate Professor, Computer Science Department 
> University of
>     > Southern California, Los Angeles, CA 90089 USA
>     >     WWW: http://irds.usc.edu/
>     >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>     > ++++++++++++++
>     >
>     >
>     >     On 4/26/17, 9:35 AM, "Allison, Timothy B." <talli...@mitre.org>
> wrote:
>     >
>     >         Oh.  Ok.  Will wait, then?
>     >
>     >         -----Original Message-----
>     >         From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.
>     > nasa.gov]
>     >         Sent: Wednesday, April 26, 2017 11:38 AM
>     >         To: dev@tika.apache.org
>     >         Subject: Re: Tika 1.15
>     >
>     >         I want to see if I can get in the VideoRecognition parser,
> and
>     > also the Sentiment one.
>     >
>     >         I hope to get it done in the next day or so. Thanks.
>     >
>     >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>     > ++++++++++++++
>     >         Chris Mattmann, Ph.D.
>     >         Principal Data Scientist, Engineering Administrative Office
>     > (3010) Manager, NSF & Open Source Projects Formulation and 
> Development
>     > Offices
>     > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>     >         Office: 180-503E, Mailstop: 180-503
>     >         Email: chris.a.mattm...@nasa.gov
>     >         WWW:  http://sunset.usc.edu/~mattmann/
>     >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>     > ++++++++++++++
>     >         Director, Information Retrieval and Data Science Group (IRDS)
>     > Adjunct Associate Professor, Computer Science Department 
> University of
>     > Southern California, Los Angeles, CA 90089 USA
>     >         WWW: http://irds.usc.edu/
>     >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>     > ++++++++++++++
>     >
>     >
>     >         On 4/26/17, 7:54 AM, "Allison, Timothy B."
>     > <talli...@mitre.org>
>     > wrote:
>     >
>     >             With the added TSD parser, I think I should rerun the
>     > regression testing.  Given that, I also fixed 2099, and we'll benefit
>     > from a rerun.
>     >
>     >             Anything else before I rerun the regression testing?
>     >
>     >             Any problems observed in first run?
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>     >
>
>
>
>

Reply via email to