+1

Thanks to Bob for moving forward on this, it's definitely needed.

As one example, the growing list of dependencies is making it increasingly hard 
to build a reasonable size job jar for processing the sub-set of all docs we 
care about in a web crawl.

-- Ken

> From: Bob Paulin
> Sent: November 30, 2015 7:08:17am PST
> To: dev@tika.apache.org
> Subject: Re: more modular parser bundles
> 
> Hi,
> 
> I think Chris actually mentioned that this could be something targeted for
> a 2.0 release.  The first step towards that would be to create the 2.0
> branch since I think this might be a big enough effort to not want to block
> the trunk ( or master if we move to git).  Would the list agree that now
> would be a good time to branch?
> 
> - Bob
> 
> On Mon, Nov 30, 2015 at 6:24 AM, Allison, Timothy B. <talli...@mitre.org>
> wrote:
> 
>> All,
>> 
>>  I'm extremely grateful for all of the new nlp +image processing parsers
>> that we're adding.  Might it be time to start down the implementation path
>> to more modular parser bundles?
>> 
>>  Perhaps we could start with a tika-advanced-bundle to gather all of the
>> nlp/advanced parsers?  Or would this have to wait for Tika 2.0?
>> 
>>  Bob got us off to a great start.  There hasn't been much discussion
>> since August.  I think my email from 24 Aug [1] was the last?
>> 
>>          Cheers,
>> 
>>                        Tim
>> 
>> [1]
>> https://mail-archives.apache.org/mod_mbox/tika-dev/201508.mbox/%3cdm2pr09mb071305dfd203e21bfbe7a63ac7...@dm2pr09mb0713.namprd09.prod.outlook.com%3e
>> 
>> -----Original Message-----
>> From: Madhav Sharan (JIRA) [mailto:j...@apache.org]
>> Sent: Wednesday, November 25, 2015 6:16 PM
>> To: dev@tika.apache.org
>> Subject: [jira] [Created] (TIKA-1803) Use lucene-geo-gazetteer REST API in
>> GeoTopicParser
>> 
>> Madhav Sharan created TIKA-1803:
>> -----------------------------------
>> 
>>             Summary: Use lucene-geo-gazetteer REST API in GeoTopicParser
>>                 Key: TIKA-1803
>>                 URL: https://issues.apache.org/jira/browse/TIKA-1803
>>             Project: Tika
>>          Issue Type: Sub-task
>>          Components: parser
>>            Reporter: Madhav Sharan
>> 
>> 
>> As of now tika uses lucene-geo-gazetteer CLI to extract co-ordinates of a
>> location. CLI requires jvm and lucene to instantiate for every request.
>> With all new REST api it will be possible to gain improvement in this space.
>> 
>> Idea is to create a client of lucene-geo-gazetteer in tika and use it in
>> GeoTopicParser
>> 
>> 
>> 
>> --
>> This message was sent by Atlassian JIRA
>> (v6.3.4#6332)
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to