Setting up an Apache Tika Meetup [was: Recording/Streaming Apache Tika Virtual Meetings to YouTube]

2021-10-15 Thread Tim Allison
All,
  Unless there are objections, I'll set up an Apache Tika Meetup later
today.  If there are better media options, let me know.

 Best,

Tim

On Thu, Oct 14, 2021 at 12:05 PM Tim Allison  wrote:
>
> Lewis,
>   Thank you for getting the ball rolling on this.  I think it would be
> great to have semi-regular meetings of devs and/or community outreach.
> For example, I'd like to host an outreachy, tika-eval deep dive for
> [1].  I can think of a few other outreachy topics, especially around
> migrating to 2.x.
>   Any objections if I started a Meetup group?  Did we ever settle on a 
> platform?
>
>  Cheers,
>
> Tim
>
>
> [1] https://www.dpconline.org/events/world-digital-preservation-day
>
> On Wed, May 19, 2021 at 1:57 PM lewis john mcgibbney  
> wrote:
> >
> > Hi Swapnil,
> > Excellent., Thank you. Replies inline below
> >
> > On Wed, May 19, 2021 at 9:53 AM Swapnil M Mane 
> > wrote:
> >
> > >
> > > If it is a community meetup where the participant has active
> > > involvement in conversation, we should not go for YouTube live.
> > >
> >
> > It IS a community meetup participants actively engage in and trade
> > conversation and opinions. So it sounds like YouTube live is not the
> > correct solution.
> >
> >
> > > One of the popular tool used for live streams is Streamyard. You can
> > > find more details here [1].
> > >
> >
> > I had never heard of it, thanks for the pointer.
> >
> >
> > >
> > > By the way, which tool community used for the last meeting (Zoom,
> > > Google meet or something else)?
> >
> >
> > The meeting was hosted on a paid version of WebEx. It would be great if we
> > could move away from this for the next meeting.
> >
> > lewismc


[jira] [Commented] (TIKA-3575) Cannot use loadErrorHandler="ignore" in tika config

2021-10-15 Thread Andreas Hubold (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17429131#comment-17429131
 ] 

Andreas Hubold commented on TIKA-3575:
--

Thanks [~tallison], I'd suggest to
 * either change the default for loadErrorHandler in 
TikaConfig#serviceLoaderFromDomElement back to IGNORE. (this would be my 
preferred choice, and a very simple change)
 * or keep the default at THROW but extend #serviceLoaderFromDomElement to 
check for a value of "ignore" in the attribute and respect that. And if the 
default is THROW now, it should also be the default if no service-loader 
element specified, otherwise it feels inconsistent and could surprise users. If 
you search for org.apache.tika.config.LoadErrorHandler#IGNORE, you can see that 
it's still the default at some places.

{quote}The goal was to allow finer-grained module selection so that you'd never 
have load errors that you'd want to ignore.
{quote}
I really like the separation into modules in Tika 2.x. That's a great 
improvement!

Our use case for LoadErrorHandler#IGNORE: It can still be useful to include a 
module but exclude some of its parsers/dependencies. For example we're using 
tika-parser-code-module but just don't need Matlab and SAS7BDATParser, so we 
want to exclude parso and jmatio dependencies to reduce the number of 
dependencies. It's a nice feature that this disables the parsers without 
additional necessary configuration in tika config (and our downstream users 
could simply add dependencies to enable parsers without touching configuration).

I think it's a good idea to bundle different parsers into logical modules, like 
different code parsers in tika-parser-code-modules. But sometimes that may not 
be fine-grained enough, and that's where LoadErrorHandler#IGNORE plays a nice 
role, IMHO.

> Cannot use loadErrorHandler="ignore" in tika config
> ---
>
> Key: TIKA-3575
> URL: https://issues.apache.org/jira/browse/TIKA-3575
> Project: Tika
>  Issue Type: Bug
>  Components: config
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Andreas Hubold
>Priority: Major
>  Labels: regression
>
> Tika 2.0.0 changed the default error handler to throw exceptions, and does 
> not ignore errors when loading parsers anymore as it was the case with Tika 
> 1.x.
> See  
> [https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470|https://github.com/apache/tika/commit/e47c6cd62e587fdaae7e2e999f37122d09449754#diff-3955d56f4d95c6e600966c486c58f92483c900d32d553d18b3cf2940cbf2c768R470]
> There's no configuration option to restore the previous behavior. It should 
> be possible to set
> {code}
> 
> {code}
> but the code in org.apache.tika.config.TikaConfig#serviceLoaderFromDomElement 
> only considers "warn" and "throw" as possible values.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)