[jira] [Commented] (TIKA-3770) General upgrades for 1.28.3
[ https://issues.apache.org/jira/browse/TIKA-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539362#comment-17539362 ] Hudson commented on TIKA-3770: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #207 (See [https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/207/]) TIKA-3770: update uimaj-core (tilman: [https://github.com/apache/tika/commit/4c4b92811c71788e9a275f7765fcca074b3c11ec]) * (edit) tika-parsers/pom.xml > General upgrades for 1.28.3 > --- > > Key: TIKA-3770 > URL: https://issues.apache.org/jira/browse/TIKA-3770 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007)
Final reminder: ApacheCon North America call for presentations closing soon
[Note: You're receiving this because you are subscribed to one or more Apache Software Foundation project mailing lists.] This is your final reminder that the Call for Presetations for ApacheCon North America 2022 will close at 00:01 GMT on Monday, May 23rd, 2022. Please don't wait! Get your talk proposals in now! Details here: https://apachecon.com/acna2022/cfp.html --Rich, for the ApacheCon Planners
Re: Automatic updates?
I just click the button. Is your username asfgit? Kidding! I have no idea why your name isn't showing up. On Thu, May 19, 2022 at 12:03 AM Tilman Hausherr wrote: > What do you do differently than I do? I noticed that in the recent > commit, your name appears in the PR, mine doesn't. > > https://github.com/apache/tika/pull/562/ > https://github.com/apache/tika/pull/563/ > "asfgit" > > https://github.com/apache/tika/pull/564/ > "tballison" > > Could it because I'm using my real mail address in commits and you're > using your apache mail address? > > Tilman > > Am 18.05.2022 um 15:36 schrieb Tim Allison: > > Oh, ok, phew. Thank you, Tilman. I remember seeing you merge some of > the > > others, and I agree, I'm not able to see that history now. > > > > As long as our AI overlords haven't taken control of our code without > some > > kind of manual review, all good. Thank you. > > > > On Wed, May 18, 2022 at 8:37 AM Tilman Hausherr > > wrote: > > > >> The previous one (not this one) was me, I merged the branch locally and > >> then pushed it. > >> > >> So there's still a manual step but somehow the history doesn't show > this. > >> > >> Tilman > >> > >> > >> > >> --- Original-Nachricht --- > >> Von: Tim Allison > >> Betreff: Automatic updates? > >> Datum: 18. Mai 2022, 14:25 > >> An: > >> > >> > >> > >> > >> @font-face { font-family: telegrotesk-medium_normal; src: > >> url("file:///android_asset/fonts/telegrotesk_normal.ttf");}html,body { > >> font-family: "telegrotesk-medium_normal"; font-size: medium; color: > >> #4b4b4b; width: 100%;} > >> > >> All, > >> It feels like something changed in the last week with our dependabot > >> integration. We used to get PRs. Now we're getting PRs that are > >> automatically merged. > >> I don't think this is a great idea. What do you think? > >> > >> Best, > >> > >> Tim > >> > >> On Wed, May 18, 2022 at 1:55 AM GitBox wrote: > >> > >>> dependabot[bot] opened a new pull request, #562: > >>> URL: https://github.com/apache/tika/pull/562 > >>> > >>> Bumps [zstd-jni](https://github.com/luben/zstd-jni) from 1.5.2-2 to > >>> 1.5.2-3. > >>> > >>> Commits > >>> > >>> https://github.com/luben/zstd-jni/commit/c983ae3e086b63a40e1bb430cb2ebf95ecc52c71 > >> ">c983ae3; > >>> Adjust signature comments after > >>> e5c6a3290b8335db7c70877fda22ca26a96c72e4. > >>> https://github.com/luben/zstd-jni/commit/510bbd6be80592227c6e5cf8cd8d71cb76c0c279 > >> ">510bbd6; > >>> Add methods for streaming (de)compression of direct ByteBuffers. > >>> https://github.com/luben/zstd-jni/commit/62b9dad49fc00f253cb35c1942c3ca6af4ee2b47 > >> ">62b9dad; > >>> Fix lgtm C++. > >>> https://github.com/luben/zstd-jni/commit/73ae46e1af16619143b7c87e35ad9c05363e2c97 > >> ">73ae46e; > >>> v1.5.2-3 > >>> https://github.com/luben/zstd-jni/commit/e5c6a3290b8335db7c70877fda22ca26a96c72e4 > >> ">e5c6a32; > >>> Fix overflows > >>> https://github.com/luben/zstd-jni/commit/54d3d50c16d96bd8a30e2d4c0a9648001a52d6f9 > >> ">54d3d50; > >>> Fix some error return codes. > >>> https://github.com/luben/zstd-jni/commit/b788a2ed7a5e36e5252b1696e6cc8bae48a7afbc > >> ">b788a2e; > >>> Upgrade scala. > >>> https://github.com/luben/zstd-jni/commit/31060934c26e080031465702ec369591e12874f8 > >> ">3106093; > >>> Add NoFinalizer variants for the direct buffer streams. > >>> See full diff in https://github.com/luben/zstd-jni/compare/v1.5.2-2...v1.5.2-3";>compare > >>> view > >>> > >>> > >>> > >>> > >>> > >>> [![Dependabot compatibility score]( > >>> > >> > https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.github.luben:zstd-jni&package-manager=maven&previous-version=1.5.2-2&new-version=1.5.2-3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores > >>> ) > >>> > >>> Dependabot will resolve any conflicts with this PR as long as you don't > >>> alter it yourself. You can also trigger a rebase manually by commenting > >>> `@dependabot rebase`. > >>> > >>> [//]: # (dependabot-automerge-start) > >>> [//]: # (dependabot-automerge-end) > >>> > >>> --- > >>> > >>> > >>> Dependabot commands and options > >>> > >>> > >>> You can trigger Dependabot actions by commenting on this PR: > >>> - `@dependabot rebase` will rebase this PR > >>> - `@dependabot recreate` will recreate this PR, overwriting any edits > >>> that have been made to it > >>> - `@dependabot merge` will merge this PR after your CI passes on it > >>> - `@dependabot squash and merge` will squash and merge this PR after > >>> your CI passes on it > >>> - `@dependabot cancel merge` will cancel a previously requested merge > >>> and block automerging > >>> - `@dependabot reopen` will reopen this PR if it is closed > >>> - `@dependabot close` will close this PR and stop Dependabot recreating > >>> it. You can achieve the same result by closing it manually > >>> - `@dependabot ignore this major version` will close this PR and stop > >>> Dependabot creating any more for this
[jira] [Commented] (TIKA-3770) General upgrades for 1.28.3
[ https://issues.apache.org/jira/browse/TIKA-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539498#comment-17539498 ] Tim Allison commented on TIKA-3770: --- I had to make some subtle changes in how we were calling one of the underlying dl4j libraries. I can look at the commit history in main and cherrypick that into 1.x if anyone wants to update those dependencies. > General upgrades for 1.28.3 > --- > > Key: TIKA-3770 > URL: https://issues.apache.org/jira/browse/TIKA-3770 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [tika] tballison closed pull request #566: Bump solr-solrj from 8.11.1 to 9.0.0
tballison closed pull request #566: Bump solr-solrj from 8.11.1 to 9.0.0 URL: https://github.com/apache/tika/pull/566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] tballison commented on pull request #566: Bump solr-solrj from 8.11.1 to 9.0.0
tballison commented on PR #566: URL: https://github.com/apache/tika/pull/566#issuecomment-1131754057 @dependabot ignore this major version Solrj 9 requires Java 11. We can't upgrade while we're still on Java 8. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] dependabot[bot] commented on pull request #566: Bump solr-solrj from 8.11.1 to 9.0.0
dependabot[bot] commented on PR #566: URL: https://github.com/apache/tika/pull/566#issuecomment-1131754159 OK, I won't notify you about version 9.x.x again, unless you re-open this PR or update to a 9.x.x release yourself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (TIKA-3710) HTML document detected incorrect as message/rfc822
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539574#comment-17539574 ] Tim Allison commented on TIKA-3710: --- Sorry, that comment must have referred to the patterns in that block that allowed content before the html tags. The patterns currently require the {{Is it valid for a message/rfc822 message to have a bunch of preamble like the >HTML tags in my document before the headers? My memory is that we've seen some crazy headers before the usual rfc822 headers. I do not think we've seen html tags in those. > HTML document detected incorrect as message/rfc822 > -- > > Key: TIKA-3710 > URL: https://issues.apache.org/jira/browse/TIKA-3710 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.3.0 >Reporter: Sam Stephens >Priority: Major > Attachments: html-that-looks-like-rfc822.html > > > I'm detecting content types and extracting text from documents using the > AutoDetectParser. > I've received some documents that are HTML fragments generated from emails. > The documents are clearly HTML, not emails, but the AutoDetectParser gives me > the MIME type message/rfc822 and no text. I've attached an example. > It looks like the presence of From:, Sent:, and Subject: at the beginning of > lines is why the documents are matching RFC822. However, I believe the > presence of HTML before these headers means the document is not valid RFC822. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3710) HTML document detected incorrect as message/rfc822
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539580#comment-17539580 ] Tim Allison commented on TIKA-3710: --- This works on the test file: {noformat} {noformat} > HTML document detected incorrect as message/rfc822 > -- > > Key: TIKA-3710 > URL: https://issues.apache.org/jira/browse/TIKA-3710 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.3.0 >Reporter: Sam Stephens >Priority: Major > Attachments: html-that-looks-like-rfc822.html > > > I'm detecting content types and extracting text from documents using the > AutoDetectParser. > I've received some documents that are HTML fragments generated from emails. > The documents are clearly HTML, not emails, but the AutoDetectParser gives me > the MIME type message/rfc822 and no text. I've attached an example. > It looks like the presence of From:, Sent:, and Subject: at the beginning of > lines is why the documents are matching RFC822. However, I believe the > presence of HTML before these headers means the document is not valid RFC822. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Comment Edited] (TIKA-3710) HTML document detected incorrect as message/rfc822
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539574#comment-17539574 ] Tim Allison edited comment on TIKA-3710 at 5/19/22 2:25 PM: Sorry, that comment must have referred to the patterns in that block that allowed content before the html tags. The patterns currently require the {{Is it valid for a message/rfc822 message to have a bunch of preamble like the >HTML tags in my document before the headers? My memory is that we've seen some crazy headers before the usual rfc822 headers. I do not think we've seen html tags in those. > HTML document detected incorrect as message/rfc822 > -- > > Key: TIKA-3710 > URL: https://issues.apache.org/jira/browse/TIKA-3710 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.3.0 >Reporter: Sam Stephens >Priority: Major > Attachments: html-that-looks-like-rfc822.html > > > I'm detecting content types and extracting text from documents using the > AutoDetectParser. > I've received some documents that are HTML fragments generated from emails. > The documents are clearly HTML, not emails, but the AutoDetectParser gives me > the MIME type message/rfc822 and no text. I've attached an example. > It looks like the presence of From:, Sent:, and Subject: at the beginning of > lines is why the documents are matching RFC822. However, I believe the > presence of HTML before these headers means the document is not valid RFC822. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3710) HTML document detected incorrect as message/rfc822
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539582#comment-17539582 ] Nick Burch commented on TIKA-3710: -- I was thinking we'd do (open)h1(close) or (open)h1(space) to cover both HTML cases but reduce the changes of a false positive match (+h2/h3) > HTML document detected incorrect as message/rfc822 > -- > > Key: TIKA-3710 > URL: https://issues.apache.org/jira/browse/TIKA-3710 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.3.0 >Reporter: Sam Stephens >Priority: Major > Attachments: html-that-looks-like-rfc822.html > > > I'm detecting content types and extracting text from documents using the > AutoDetectParser. > I've received some documents that are HTML fragments generated from emails. > The documents are clearly HTML, not emails, but the AutoDetectParser gives me > the MIME type message/rfc822 and no text. I've attached an example. > It looks like the presence of From:, Sent:, and Subject: at the beginning of > lines is why the documents are matching RFC822. However, I believe the > presence of HTML before these headers means the document is not valid RFC822. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3710) HTML document detected incorrect as message/rfc822
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539590#comment-17539590 ] Tim Allison commented on TIKA-3710: --- Sounds good. What do you think of breaking those out into a higher priority block as above? Obv, we'll need to run this on a bunch of docs to see if this is overall a good change... > HTML document detected incorrect as message/rfc822 > -- > > Key: TIKA-3710 > URL: https://issues.apache.org/jira/browse/TIKA-3710 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.3.0 >Reporter: Sam Stephens >Priority: Major > Attachments: html-that-looks-like-rfc822.html > > > I'm detecting content types and extracting text from documents using the > AutoDetectParser. > I've received some documents that are HTML fragments generated from emails. > The documents are clearly HTML, not emails, but the AutoDetectParser gives me > the MIME type message/rfc822 and no text. I've attached an example. > It looks like the presence of From:, Sent:, and Subject: at the beginning of > lines is why the documents are matching RFC822. However, I believe the > presence of HTML before these headers means the document is not valid RFC822. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3710) HTML document detected incorrect as message/rfc822
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539594#comment-17539594 ] Nick Burch commented on TIKA-3710: -- As a "normal" html file wouldn't start with these snippets, and they're already at a pretty high priority, I think just leave them in the 60 block along with the more typical starting tags we have there now > HTML document detected incorrect as message/rfc822 > -- > > Key: TIKA-3710 > URL: https://issues.apache.org/jira/browse/TIKA-3710 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.3.0 >Reporter: Sam Stephens >Priority: Major > Attachments: html-that-looks-like-rfc822.html > > > I'm detecting content types and extracting text from documents using the > AutoDetectParser. > I've received some documents that are HTML fragments generated from emails. > The documents are clearly HTML, not emails, but the AutoDetectParser gives me > the MIME type message/rfc822 and no text. I've attached an example. > It looks like the presence of From:, Sent:, and Subject: at the beginning of > lines is why the documents are matching RFC822. However, I believe the > presence of HTML before these headers means the document is not valid RFC822. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3710) HTML document detected incorrect as message/rfc822
[ https://issues.apache.org/jira/browse/TIKA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539607#comment-17539607 ] Tim Allison commented on TIKA-3710: --- The current main block is 40, which is intentionally below RFC822. How's this look: {noformat} ... {noformat} > HTML document detected incorrect as message/rfc822 > -- > > Key: TIKA-3710 > URL: https://issues.apache.org/jira/browse/TIKA-3710 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 2.3.0 >Reporter: Sam Stephens >Priority: Major > Attachments: html-that-looks-like-rfc822.html > > > I'm detecting content types and extracting text from documents using the > AutoDetectParser. > I've received some documents that are HTML fragments generated from emails. > The documents are clearly HTML, not emails, but the AutoDetectParser gives me > the MIME type message/rfc822 and no text. I've attached an example. > It looks like the presence of From:, Sent:, and Subject: at the beginning of > lines is why the documents are matching RFC822. However, I believe the > presence of HTML before these headers means the document is not valid RFC822. -- This message was sent by Atlassian Jira (v8.20.7#820007)
Re: Automatic updates?
I'm unable to show it now, but I never had a "merge" button. But I remember a "Only those with write access to this repository can merge pull requests" text, could it be that I need some additional permissions? Tilman Am 19.05.2022 um 14:00 schrieb Tim Allison: I just click the button. Is your username asfgit? Kidding! I have no idea why your name isn't showing up. On Thu, May 19, 2022 at 12:03 AM Tilman Hausherr wrote: What do you do differently than I do? I noticed that in the recent commit, your name appears in the PR, mine doesn't. https://github.com/apache/tika/pull/562/ https://github.com/apache/tika/pull/563/ "asfgit" https://github.com/apache/tika/pull/564/ "tballison" Could it because I'm using my real mail address in commits and you're using your apache mail address? Tilman
Re: Automatic updates?
Hmmm... what do you see here: https://gitbox.apache.org/boxer/ On Thu, May 19, 2022 at 11:58 AM Tilman Hausherr wrote: > I'm unable to show it now, but I never had a "merge" button. But I > remember a "Only those with write access to this repository can merge > pull requests" text, could it be that I need some additional permissions? > > Tilman > > Am 19.05.2022 um 14:00 schrieb Tim Allison: > > I just click the button. Is your username asfgit? Kidding! I have no > > idea why your name isn't showing up. > > > > On Thu, May 19, 2022 at 12:03 AM Tilman Hausherr > > wrote: > > > >> What do you do differently than I do? I noticed that in the recent > >> commit, your name appears in the PR, mine doesn't. > >> > >> https://github.com/apache/tika/pull/562/ > >> https://github.com/apache/tika/pull/563/ > >> "asfgit" > >> > >> https://github.com/apache/tika/pull/564/ > >> "tballison" > >> > >> Could it because I'm using my real mail address in commits and you're > >> using your apache mail address? > >> > >> Tilman > >
Re: Automatic updates?
Am 19.05.2022 um 18:16 schrieb Tim Allison: Hmmm... what do you see here: https://gitbox.apache.org/boxer/ Thank you, I hadn't known about that. I have now linked my account. The appearance of https://github.com/apache/tika/pull/561/ has now changed, cool! Tilman On Thu, May 19, 2022 at 11:58 AM Tilman Hausherr wrote: I'm unable to show it now, but I never had a "merge" button. But I remember a "Only those with write access to this repository can merge pull requests" text, could it be that I need some additional permissions? Tilman Am 19.05.2022 um 14:00 schrieb Tim Allison: I just click the button. Is your username asfgit? Kidding! I have no idea why your name isn't showing up. On Thu, May 19, 2022 at 12:03 AM Tilman Hausherr wrote: What do you do differently than I do? I noticed that in the recent commit, your name appears in the PR, mine doesn't. https://github.com/apache/tika/pull/562/ https://github.com/apache/tika/pull/563/ "asfgit" https://github.com/apache/tika/pull/564/ "tballison" Could it because I'm using my real mail address in commits and you're using your apache mail address? Tilman
Re: Automatic updates?
It took me some googling... I had forgotten about that step. Great news. Thank you! On Thu, May 19, 2022 at 12:27 PM Tilman Hausherr wrote: > Am 19.05.2022 um 18:16 schrieb Tim Allison: > > Hmmm... what do you see here: https://gitbox.apache.org/boxer/ > > Thank you, I hadn't known about that. I have now linked my account. The > appearance of > > https://github.com/apache/tika/pull/561/ > > has now changed, cool! > > Tilman > > > > > > > On Thu, May 19, 2022 at 11:58 AM Tilman Hausherr > > wrote: > > > >> I'm unable to show it now, but I never had a "merge" button. But I > >> remember a "Only those with write access to this repository can merge > >> pull requests" text, could it be that I need some additional > permissions? > >> > >> Tilman > >> > >> Am 19.05.2022 um 14:00 schrieb Tim Allison: > >>> I just click the button. Is your username asfgit? Kidding! I have no > >>> idea why your name isn't showing up. > >>> > >>> On Thu, May 19, 2022 at 12:03 AM Tilman Hausherr > > >>> wrote: > >>> > What do you do differently than I do? I noticed that in the recent > commit, your name appears in the PR, mine doesn't. > > https://github.com/apache/tika/pull/562/ > https://github.com/apache/tika/pull/563/ > "asfgit" > > https://github.com/apache/tika/pull/564/ > "tballison" > > Could it because I'm using my real mail address in commits and you're > using your apache mail address? > > Tilman > >> > >
[jira] [Commented] (TIKA-3770) General upgrades for 1.28.3
[ https://issues.apache.org/jira/browse/TIKA-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539700#comment-17539700 ] Hudson commented on TIKA-3770: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #208 (See [https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/208/]) TIKA-3770: update micrometer (tilman: [https://github.com/apache/tika/commit/74b08d1234280da5e2475f1716b7a5437cfd]) * (edit) tika-server/pom.xml TIKA-3770: update zstd-jni (tilman: [https://github.com/apache/tika/commit/7fd12a3f53773f10aa8a8ec7d640e25a87658188]) * (edit) tika-parsers/pom.xml > General upgrades for 1.28.3 > --- > > Key: TIKA-3770 > URL: https://issues.apache.org/jira/browse/TIKA-3770 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539809#comment-17539809 ] Dan Coldrick commented on TIKA-1570: [~tallison] I've tested and it works, I've created a WIP page in confluence on how I got it to install as a Windows service. I needed a break from DWG's so picked this up instead :) Feel free to butcher my confluence page: [https://cwiki.apache.org/confluence/display/TIKA/TikaServer+Windows+Service+-+WIP] > Seeking a stop method for better use with Apache Commons Daemon > --- > > Key: TIKA-1570 > URL: https://issues.apache.org/jira/browse/TIKA-1570 > Project: Tika > Issue Type: Improvement > Components: server >Affects Versions: 1.7 >Reporter: Jason Borg >Priority: Minor > Fix For: 2.4.1 > > > I've got tika-server-1.7.jar from http://tika.apache.org/download.html > I've downloaded v1.0.15 of the Windows binaries for Apache Commons Daemon > from http://commons.apache.org/proper/commons-daemon/binaries.html > I can get Tika started as a service, but I can't determine what to use for a > stop method. > prunsrv.exe //IS//tika-daemon --DisplayName "Tika Daemon" --Classpath > "C:\Tika Service\tika-server-1.7.jar" --StartClass > "org.apache.tika.server.TikaServerCli" --StopClass > "org.apache.tika.server.TikaServerCli" --StartMethod main --StopMethod main > --Description "Tika Daemon Windows Service" --StartMode java --StopMode java > This starts, and works as I'd hope, but when trying to stop the service it > doesn't respond. Obviously org.apache.tika.server.TikaServerCli.main(string[] > args) isn't a suitable stop method, but I'm lost for alternatives. > Using Daemon in exe mode works for start, but gives inconsistent results for > stop. Adding a stop method to Tika would be ideal. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3770) General upgrades for 1.28.3
[ https://issues.apache.org/jira/browse/TIKA-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539896#comment-17539896 ] Hudson commented on TIKA-3770: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #209 (See [https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/209/]) TIKA-3770: update lombok and jakarta.annotation-api (tilman: [https://github.com/apache/tika/commit/4009fc6a518059141ff54d4a005bddab72954938]) * (edit) tika-parsers/pom.xml * (edit) tika-parent/pom.xml > General upgrades for 1.28.3 > --- > > Key: TIKA-3770 > URL: https://issues.apache.org/jira/browse/TIKA-3770 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (TIKA-3771) Regression from TIKA-3687: Files wrongly detected as EML
Luís Filipe Nassif created TIKA-3771: Summary: Regression from TIKA-3687: Files wrongly detected as EML Key: TIKA-3771 URL: https://issues.apache.org/jira/browse/TIKA-3771 Project: Tika Issue Type: Bug Affects Versions: 2.4.0 Reporter: Luís Filipe Nassif Attachments: BEA498353ECFA1C440365BB434BBC228269917D7.png Running regression tests in the process of upgrading to Tika-2.4.0 from 1.x, I detected some hundreds of samples of different file types now are being detected as EML. This is caused by the rule added in TIKA-3687 in the minShouldMatch="2" clause. Attached is a sample PNG file that triggers this (it also has another \nDate: value in the first 1024 bytes). Another not related thing, I tried to override the message/rfc822 mime definition with a custom-tika-mimetypes.xml in classpath, but it had no effect, it used to work in Tika-1.x. Was that change intentional? I think user definitions should take precedence over Tika definitions, since they can change depending on domain or context (e.g. the same extension may be used by different applications). -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3771) Regression from TIKA-3687: Files wrongly detected as EML
[ https://issues.apache.org/jira/browse/TIKA-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luís Filipe Nassif updated TIKA-3771: - Description: Running regression tests in the process of upgrading to Tika-2.4.0 from 1.x, I detected some hundreds of samples of different file types now are being detected as EML. This is caused by the rule added in TIKA-3687 in the minShouldMatch="2" clause. Attached is a sample PNG file that triggers this (it also has another \nDate: value in the first 1024 bytes). Another not related thing, I tried to override the message/rfc822 mime definition with a custom-tika-mimetypes.xml in classpath, but it had no effect, it used to work in Tika-1.x. Was that change intentional? I think user definitions should take precedence over Tika definitions, since they can change depending on domain or context (e.g. the same extension may be used by different applications). If it wasn't intentional, I'll open other issue. was: Running regression tests in the process of upgrading to Tika-2.4.0 from 1.x, I detected some hundreds of samples of different file types now are being detected as EML. This is caused by the rule added in TIKA-3687 in the minShouldMatch="2" clause. Attached is a sample PNG file that triggers this (it also has another \nDate: value in the first 1024 bytes). Another not related thing, I tried to override the message/rfc822 mime definition with a custom-tika-mimetypes.xml in classpath, but it had no effect, it used to work in Tika-1.x. Was that change intentional? I think user definitions should take precedence over Tika definitions, since they can change depending on domain or context (e.g. the same extension may be used by different applications). > Regression from TIKA-3687: Files wrongly detected as EML > - > > Key: TIKA-3771 > URL: https://issues.apache.org/jira/browse/TIKA-3771 > Project: Tika > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Luís Filipe Nassif >Priority: Major > Attachments: BEA498353ECFA1C440365BB434BBC228269917D7.png > > > Running regression tests in the process of upgrading to Tika-2.4.0 from 1.x, > I detected some hundreds of samples of different file types now are being > detected as EML. This is caused by the offset="0:1024"/> rule added in TIKA-3687 in the minShouldMatch="2" clause. > Attached is a sample PNG file that triggers this (it also has another \nDate: > value in the first 1024 bytes). > Another not related thing, I tried to override the message/rfc822 mime > definition with a custom-tika-mimetypes.xml in classpath, but it had no > effect, it used to work in Tika-1.x. Was that change intentional? I think > user definitions should take precedence over Tika definitions, since they can > change depending on domain or context (e.g. the same extension may be used by > different applications). If it wasn't intentional, I'll open other issue. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (TIKA-3771) Regression from TIKA-3687: Files wrongly detected as EML
[ https://issues.apache.org/jira/browse/TIKA-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luís Filipe Nassif updated TIKA-3771: - Description: Running regression tests in the process of upgrading to Tika-2.4.0 from 1.x, I detected some hundreds of samples from 1M of different file types now are being detected as EML. This is caused by the rule added in TIKA-3687 in the minShouldMatch="2" clause. Attached is a sample PNG file that triggers this (it also has another \nDate: value in the first 1024 bytes). Another not related thing, I tried to override the message/rfc822 mime definition with a custom-tika-mimetypes.xml in classpath, but it had no effect, it used to work in Tika-1.x. Was that change intentional? I think user definitions should take precedence over Tika definitions, since they can change depending on domain or context (e.g. the same extension may be used by different applications). If it wasn't intentional, I'll open other issue. was: Running regression tests in the process of upgrading to Tika-2.4.0 from 1.x, I detected some hundreds of samples of different file types now are being detected as EML. This is caused by the rule added in TIKA-3687 in the minShouldMatch="2" clause. Attached is a sample PNG file that triggers this (it also has another \nDate: value in the first 1024 bytes). Another not related thing, I tried to override the message/rfc822 mime definition with a custom-tika-mimetypes.xml in classpath, but it had no effect, it used to work in Tika-1.x. Was that change intentional? I think user definitions should take precedence over Tika definitions, since they can change depending on domain or context (e.g. the same extension may be used by different applications). If it wasn't intentional, I'll open other issue. > Regression from TIKA-3687: Files wrongly detected as EML > - > > Key: TIKA-3771 > URL: https://issues.apache.org/jira/browse/TIKA-3771 > Project: Tika > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Luís Filipe Nassif >Priority: Major > Attachments: BEA498353ECFA1C440365BB434BBC228269917D7.png > > > Running regression tests in the process of upgrading to Tika-2.4.0 from 1.x, > I detected some hundreds of samples from 1M of different file types now are > being detected as EML. This is caused by the type="string" offset="0:1024"/> rule added in TIKA-3687 in the > minShouldMatch="2" clause. Attached is a sample PNG file that triggers this > (it also has another \nDate: value in the first 1024 bytes). > Another not related thing, I tried to override the message/rfc822 mime > definition with a custom-tika-mimetypes.xml in classpath, but it had no > effect, it used to work in Tika-1.x. Was that change intentional? I think > user definitions should take precedence over Tika definitions, since they can > change depending on domain or context (e.g. the same extension may be used by > different applications). If it wasn't intentional, I'll open other issue. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (TIKA-3770) General upgrades for 1.28.3
[ https://issues.apache.org/jira/browse/TIKA-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539909#comment-17539909 ] Hudson commented on TIKA-3770: -- SUCCESS: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #210 (See [https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/210/]) TIKA-3770: revert update of jakarta.annotation-api, fails on jdk11+ (tilman: [https://github.com/apache/tika/commit/8dc66e598bd4d2a481b293a8bd61fe00ecc7a1d0]) * (edit) tika-parent/pom.xml * (edit) tika-parsers/pom.xml > General upgrades for 1.28.3 > --- > > Key: TIKA-3770 > URL: https://issues.apache.org/jira/browse/TIKA-3770 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [tika] dependabot[bot] opened a new pull request, #567: Bump maven-enforcer-plugin from 3.0.0-M3 to 3.0.0
dependabot[bot] opened a new pull request, #567: URL: https://github.com/apache/tika/pull/567 Bumps [maven-enforcer-plugin](https://github.com/apache/maven-enforcer) from 3.0.0-M3 to 3.0.0. Commits https://github.com/apache/maven-enforcer/commit/b1b22822174bc92857a2e674c9a024035ee6d7cd";>b1b2282 [maven-release-plugin] prepare release enforcer-3.0.0 https://github.com/apache/maven-enforcer/commit/70de3ad6b6cf83505fe049896e37d90ac81e13f3";>70de3ad Lock maven-jxr-plugin https://github.com/apache/maven-enforcer/commit/da3f8886d41522450c4b187a5f3562a4f6309610";>da3f888 Fix JavaDoc and lock sisu-maven-plugin https://github.com/apache/maven-enforcer/commit/014253f19260b04eedccfd00678b2777f93fa4e3";>014253f update CI url https://github.com/apache/maven-enforcer/commit/5409be83dc3b621121e6222ad3830f8e95cf6614";>5409be8 [MENFORCER-211] wildcard ignore in requireReleaseDeps https://github.com/apache/maven-enforcer/commit/335f26e39d1f20e157c46485481e36f858135a14";>335f26e [MENFORCER-364] requireFilesExist rule should be case sensitive https://github.com/apache/maven-enforcer/commit/faaf5c118bd9cda06cecca94ab3f9656c1cb7927";>faaf5c1 [MENFORCER-280] Enforcer dependency convergence stumbles on selenium-java https://github.com/apache/maven-enforcer/commit/ab53fd99607eb36554f2fd3af41847ad9568a5ed";>ab53fd9 [MENFORCER-357] RequirePluginVersions not recognizing versions-from-properties https://github.com/apache/maven-enforcer/commit/1b8ca8f82815ec721e09abbd2330ce315893f2ed";>1b8ca8f [MENFORCER-388] Extends RequirePluginVersions with banMavenDefaults https://github.com/apache/maven-enforcer/commit/ca73329888b925899f4f57419a1d2ed208b1e0c4";>ca73329 [MENFORCER-359] RequirePluginVersions fails when versions are inherited Additional commits viewable in https://github.com/apache/maven-enforcer/compare/enforcer-3.0.0-M3...enforcer-3.0.0";>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.apache.maven.plugins:maven-enforcer-plugin&package-manager=maven&previous-version=3.0.0-M3&new-version=3.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] THausherr closed pull request #567: Bump maven-enforcer-plugin from 3.0.0-M3 to 3.0.0
THausherr closed pull request #567: Bump maven-enforcer-plugin from 3.0.0-M3 to 3.0.0 URL: https://github.com/apache/tika/pull/567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] THausherr commented on pull request #567: Bump maven-enforcer-plugin from 3.0.0-M3 to 3.0.0
THausherr commented on PR #567: URL: https://github.com/apache/tika/pull/567#issuecomment-1132514147 Can't do because of [MENFORCER-393](https://issues.apache.org/jira/browse/MENFORCER-393) . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] dependabot[bot] commented on pull request #567: Bump maven-enforcer-plugin from 3.0.0-M3 to 3.0.0
dependabot[bot] commented on PR #567: URL: https://github.com/apache/tika/pull/567#issuecomment-1132514169 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting `@dependabot ignore this major version` or `@dependabot ignore this minor version`. You can also ignore all major, minor, or patch releases for a dependency by adding an [`ignore` condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore) with the desired `update_types` to your config file. If you change your mind, just re-open this PR and I'll resolve any conflicts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [tika] dependabot[bot] opened a new pull request, #568: Bump jackson-databind from 2.13.2.2 to 2.13.3
dependabot[bot] opened a new pull request, #568: URL: https://github.com/apache/tika/pull/568 Bumps [jackson-databind](https://github.com/FasterXML/jackson) from 2.13.2.2 to 2.13.3. Commits See full diff in https://github.com/FasterXML/jackson/commits";>compare view [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.fasterxml.jackson.core:jackson-databind&package-manager=maven&previous-version=2.13.2.2&new-version=2.13.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org