[jira] [Commented] (TIKA-3327) Simple server metrics monitoring (server status over JMX)

2021-12-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459670#comment-17459670
 ] 

ASF GitHub Bot commented on TIKA-3327:
--

lewismc commented on a change in pull request #417:
URL: https://github.com/apache/tika/pull/417#discussion_r769277762



##
File path: 
tika-server/src/main/java/org/apache/tika/server/mbean/ServerStatusExporter.java
##
@@ -0,0 +1,66 @@
+package org.apache.tika.server.mbean;

Review comment:
   Missing license header?

##
File path: 
tika-server/src/main/java/org/apache/tika/server/mbean/ServerStatusExporterMBean.java
##
@@ -0,0 +1,44 @@
+package org.apache.tika.server.mbean;

Review comment:
   Missing license header?

##
File path: tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
##
@@ -70,6 +72,8 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import javax.management.*;

Review comment:
   I would remove wildcard imports...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Simple server metrics monitoring (server status over JMX)
> -
>
> Key: TIKA-3327
> URL: https://issues.apache.org/jira/browse/TIKA-3327
> Project: Tika
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 2.0.0, 1.25
>Reporter: Subhajit Das
>Priority: Minor
> Fix For: 1.26
>
>
> Currently only  /status endpoint is there as a simple monitoring endpoint. 
> This can not be used directly as part of standard monitoring systems 
> available.
> But no JMX MBeans or any other endpoint is there.
>  
> The status data can be exposed as an MBean.
>  
> This MBean can be then used by, something like Prometheus JMX exporter to 
> export to Prometheus. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika] lewismc commented on a change in pull request #417: TIKA-3327 Simple server metrics monitoring

2021-12-14 Thread GitBox


lewismc commented on a change in pull request #417:
URL: https://github.com/apache/tika/pull/417#discussion_r769277762



##
File path: 
tika-server/src/main/java/org/apache/tika/server/mbean/ServerStatusExporter.java
##
@@ -0,0 +1,66 @@
+package org.apache.tika.server.mbean;

Review comment:
   Missing license header?

##
File path: 
tika-server/src/main/java/org/apache/tika/server/mbean/ServerStatusExporterMBean.java
##
@@ -0,0 +1,44 @@
+package org.apache.tika.server.mbean;

Review comment:
   Missing license header?

##
File path: tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
##
@@ -70,6 +72,8 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import javax.management.*;

Review comment:
   I would remove wildcard imports...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (TIKA-3353) Tika Server Production ready monitoring (Prometheus and JMX)

2021-12-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459668#comment-17459668
 ] 

ASF GitHub Bot commented on TIKA-3353:
--

lewismc commented on a change in pull request #429:
URL: https://github.com/apache/tika/pull/429#discussion_r769272590



##
File path: 
tika-server/src/main/java/org/apache/tika/server/metrics/ServerStatusMetrics.java
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.server.metrics;
+
+import io.micrometer.core.instrument.Gauge;
+import io.micrometer.core.instrument.MeterRegistry;
+import io.micrometer.core.instrument.binder.MeterBinder;
+import org.apache.tika.server.ServerStatus;
+import org.jetbrains.annotations.NotNull;
+
+/**
+ * Server status metrics meter binder.
+ */
+public class ServerStatusMetrics implements MeterBinder {
+
+/**
+ * The server status currently in use.
+ */
+private ServerStatus serverStatus;
+
+/**
+ * Initializes server status metrics with the server status object.
+ * @param serverStatus the server status.
+ */
+public ServerStatusMetrics(ServerStatus serverStatus) {
+this.serverStatus = serverStatus;
+}
+
+/**
+ * Binds server status metrics to meter registry.
+ * @param meterRegistry the meter registry to bind to.
+ */
+@Override
+public void bindTo(@NotNull MeterRegistry meterRegistry) {
+Gauge.builder("server.status.lastparsed", serverStatus, 
ServerStatus::getMillisSinceLastParseStarted)
+.description("Last parsed in milliseconds")
+.register(meterRegistry);
+Gauge.builder("server.status.restarts", serverStatus, 
ServerStatus::getNumRestarts)
+.description("Last parsed in milliseconds")

Review comment:
   The help here is incorrect.

##
File path: 
tika-server/src/main/java/org/apache/tika/server/metrics/ServerStatusMetrics.java
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.server.metrics;
+
+import io.micrometer.core.instrument.Gauge;
+import io.micrometer.core.instrument.MeterRegistry;
+import io.micrometer.core.instrument.binder.MeterBinder;
+import org.apache.tika.server.ServerStatus;
+import org.jetbrains.annotations.NotNull;
+
+/**
+ * Server status metrics meter binder.
+ */
+public class ServerStatusMetrics implements MeterBinder {
+
+/**
+ * The server status currently in use.
+ */
+private ServerStatus serverStatus;
+
+/**
+ * Initializes server status metrics with the server status object.
+ * @param serverStatus the server status.
+ */
+public ServerStatusMetrics(ServerStatus serverStatus) {
+this.serverStatus = serverStatus;
+}
+
+/**
+ * Binds server status metrics to meter registry.
+ * @param meterRegistry the meter registry to bind to.
+ */
+@Override
+public void bindTo(@NotNull MeterRegistry meterRegistry) {
+Gauge.builder("server.status.lastparsed", serverStatus, 
ServerStatus::getMillisSinceLastParseStarted)
+.description("Last parsed in milliseconds")
+.register(meterRegistry);
+Gauge.builder("server.status.restarts", serverStatus, 
ServerStatus::getNumRestarts)
+.description("Last parsed in mil

[GitHub] [tika] lewismc commented on a change in pull request #429: [TIKA-3353] Prometheus and JMX monitoring over micrometer

2021-12-14 Thread GitBox


lewismc commented on a change in pull request #429:
URL: https://github.com/apache/tika/pull/429#discussion_r769272590



##
File path: 
tika-server/src/main/java/org/apache/tika/server/metrics/ServerStatusMetrics.java
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.server.metrics;
+
+import io.micrometer.core.instrument.Gauge;
+import io.micrometer.core.instrument.MeterRegistry;
+import io.micrometer.core.instrument.binder.MeterBinder;
+import org.apache.tika.server.ServerStatus;
+import org.jetbrains.annotations.NotNull;
+
+/**
+ * Server status metrics meter binder.
+ */
+public class ServerStatusMetrics implements MeterBinder {
+
+/**
+ * The server status currently in use.
+ */
+private ServerStatus serverStatus;
+
+/**
+ * Initializes server status metrics with the server status object.
+ * @param serverStatus the server status.
+ */
+public ServerStatusMetrics(ServerStatus serverStatus) {
+this.serverStatus = serverStatus;
+}
+
+/**
+ * Binds server status metrics to meter registry.
+ * @param meterRegistry the meter registry to bind to.
+ */
+@Override
+public void bindTo(@NotNull MeterRegistry meterRegistry) {
+Gauge.builder("server.status.lastparsed", serverStatus, 
ServerStatus::getMillisSinceLastParseStarted)
+.description("Last parsed in milliseconds")
+.register(meterRegistry);
+Gauge.builder("server.status.restarts", serverStatus, 
ServerStatus::getNumRestarts)
+.description("Last parsed in milliseconds")

Review comment:
   The help here is incorrect.

##
File path: 
tika-server/src/main/java/org/apache/tika/server/metrics/ServerStatusMetrics.java
##
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.tika.server.metrics;
+
+import io.micrometer.core.instrument.Gauge;
+import io.micrometer.core.instrument.MeterRegistry;
+import io.micrometer.core.instrument.binder.MeterBinder;
+import org.apache.tika.server.ServerStatus;
+import org.jetbrains.annotations.NotNull;
+
+/**
+ * Server status metrics meter binder.
+ */
+public class ServerStatusMetrics implements MeterBinder {
+
+/**
+ * The server status currently in use.
+ */
+private ServerStatus serverStatus;
+
+/**
+ * Initializes server status metrics with the server status object.
+ * @param serverStatus the server status.
+ */
+public ServerStatusMetrics(ServerStatus serverStatus) {
+this.serverStatus = serverStatus;
+}
+
+/**
+ * Binds server status metrics to meter registry.
+ * @param meterRegistry the meter registry to bind to.
+ */
+@Override
+public void bindTo(@NotNull MeterRegistry meterRegistry) {
+Gauge.builder("server.status.lastparsed", serverStatus, 
ServerStatus::getMillisSinceLastParseStarted)
+.description("Last parsed in milliseconds")
+.register(meterRegistry);
+Gauge.builder("server.status.restarts", serverStatus, 
ServerStatus::getNumRestarts)
+.description("Last parsed in milliseconds")
+.register(meterRegistry);
+Gauge.builder("server.status.files", serverStatus, 
ServerStatus::getFilesProcessed)
+.description("Last parsed in milliseconds")

Review comment:
   The help here is i

[jira] [Comment Edited] (TIKA-3614) Trying to upgrade from 1.27 to 2.1.0

2021-12-14 Thread Vamsi Molli (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17456284#comment-17456284
 ] 

Vamsi Molli edited comment on TIKA-3614 at 12/15/21, 5:57 AM:
--

Used Tika core. getting issue reading import 
org.apache.tika.metadata.TikaMetadataKeys ,Metadata.RESOURCE_NAME_KEY from 
org.apache.tika.metadata

api(group: 'org.apache.tika', name: 'tika-core', version: '2.1.0')

{         // Tika requires a version of jackson that conflicts with other 
dependencies.         exclude group: "com.fasterxml.jackson.core"     }


                     ^
  symbol:   variable TikaMetadataKeys
  location: class FileTypeDetector


was (Author: vamsi452):
Used Tika core. getting issue reading import 
org.apache.tika.metadata.TikaMetadataKeys ,Metadata.RESOURCE_NAME_KEY from 
org.apache.tika.metadata

api(group: 'org.apache.tika', name: 'tika-core', version: '2.1.0')

{         // Tika requires a version of jackson that conflicts with other 
dependencies.         exclude group: "com.fasterxml.jackson.core"     }



/home/ubuntu/projects/epiq/ls/Qmulus_Backend/AppCommon/src/com/stormed/common/utils/FileTypeDetector.java:13:
 error: cannot find symbol
import org.apache.tika.metadata.TikaMetadataKeys;
                               ^
  symbol:   class TikaMetadataKeys
  location: package org.apache.tika.metadata
/home/ubuntu/projects/epiq/ls/Qmulus_Backend/AppCommon/src/com/stormed/common/utils/FileTypeDetector.java:74:
 error: cannot find symbol
            metadata.add(Metadata.RESOURCE_NAME_KEY, paths);
                                 ^
  symbol:   variable RESOURCE_NAME_KEY
  location: class Metadata
/home/ubuntu/projects/epiq/ls/Qmulus_Backend/AppCommon/src/com/stormed/common/utils/FileTypeDetector.java:145:
 error: cannot find symbol
        metadata.add(TikaMetadataKeys.RESOURCE_NAME_KEY, file.getName());
                     ^
  symbol:   variable TikaMetadataKeys
  location: class FileTypeDetector

> Trying to upgrade from 1.27 to 2.1.0
> 
>
> Key: TIKA-3614
> URL: https://issues.apache.org/jira/browse/TIKA-3614
> Project: Tika
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>  Labels: gradle
>
> Currently, my application is using the Tika version of 1.27, in the Gradle 
> file we wrote like below to download and use Tika components.
> api(group: 'org.apache.tika', name: 'tika-parsers', version: '1.27')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> but when trying to update to 2.1.0 with the below code. seeing some of the 
> imports are missing.
> import org.apache.tika.config.TikaConfig;
> import org.apache.tika.detect.Detector;
> import org.apache.tika.exception.TikaException;
> import org.apache.tika.io.TikaInputStream;
> import org.apache.tika.metadata.HttpHeaders;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.metadata.TikaMetadataKeys;
> import org.apache.tika.mime.MediaType;
> import org.apache.tika.mime.MimeType;
> import org.apache.tika.mime.MimeTypeException;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> Tried with below, causing the above imports missing.
>  
> api(group: 'org.apache.tika', name: 'tika-parsers-standard-package', version: 
> '2.1.0')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> Please let me know what imports I need to change to fix above issues.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (TIKA-3614) Trying to upgrade from 1.27 to 2.1.0

2021-12-14 Thread Vamsi Molli (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vamsi Molli closed TIKA-3614.
-
Resolution: Fixed

> Trying to upgrade from 1.27 to 2.1.0
> 
>
> Key: TIKA-3614
> URL: https://issues.apache.org/jira/browse/TIKA-3614
> Project: Tika
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>  Labels: gradle
>
> Currently, my application is using the Tika version of 1.27, in the Gradle 
> file we wrote like below to download and use Tika components.
> api(group: 'org.apache.tika', name: 'tika-parsers', version: '1.27')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> but when trying to update to 2.1.0 with the below code. seeing some of the 
> imports are missing.
> import org.apache.tika.config.TikaConfig;
> import org.apache.tika.detect.Detector;
> import org.apache.tika.exception.TikaException;
> import org.apache.tika.io.TikaInputStream;
> import org.apache.tika.metadata.HttpHeaders;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.metadata.TikaMetadataKeys;
> import org.apache.tika.mime.MediaType;
> import org.apache.tika.mime.MimeType;
> import org.apache.tika.mime.MimeTypeException;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> Tried with below, causing the above imports missing.
>  
> api(group: 'org.apache.tika', name: 'tika-parsers-standard-package', version: 
> '2.1.0')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> Please let me know what imports I need to change to fix above issues.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (TIKA-3614) Trying to upgrade from 1.27 to 2.1.0

2021-12-14 Thread Vamsi Molli (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457014#comment-17457014
 ] 

Vamsi Molli edited comment on TIKA-3614 at 12/15/21, 5:56 AM:
--

Seeing run time errors.


        
        
        
        
        
    

"Message": "Failed parsing Tika config. 
Error:org.apache.tika.exception.TikaConfigException: Unable to find a detector 
class: org.apache.tika.detect.zip.ZipContainerDetector",


was (Author: vamsi452):
Seeing run time errors.


        
        
        
        
        
    

{
    "Message": "Failed to get natural metadata with tika (Metadata might not be 
complete)",
    "StackTrace": "org.apache.tika.exception.TikaConfigException: Unable to 
find a detector class: org.apache.tika.detect.zip.ZipContainerDetector\n\tat 
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:739)\n\tat 
org.apache.tika.config.TikaConfig$XmlLoader.loadOverall(TikaConfig.java:610)\n\tat
 org.apache.tika.config.TikaConfig.(TikaConfig.java:153)\n\tat 
org.apache.tika.config.TikaConfig.(TikaConfig.java:139)\n\tat 
org.apache.tika.config.TikaConfig.(TikaConfig.java:131)\n\tat 
org.apache.tika.config.TikaConfig.(TikaConfig.java:99)\n\tat 
org.apache.tika.config.TikaConfig.(TikaConfig.java:95)\n\tat 
com.stormed.common.utils.TikaConfigFactory.getTikaConfig(TikaConfigFactory.java:18)\n\tat
 com.stormed.processing.NaturalFile.parseMetadata(NaturalFile.java:87)\n\tat 
com.stormed.processing.AbstractFileProduct.addNatural_Metadata(AbstractFileProduct.java:134)\n\tat
 
com.stormed.processing.ProcessingMain.processing(ProcessingMain.java:540)\n\tat 
com.stormed.processing.ProcessingMain.run(ProcessingMain.java:112)\n\tat 
com.stormed.processing.common.ProcessingBuilder.run(ProcessingBuilder.java:28)\n\tat
 com.stormed.proxy.AppRunner.run(AppRunner.java:21)\n\tat 
com.stormed.proxy.ProxyMain.runApp(ProxyMain.java:274)\n\tat 
com.stormed.proxy.ProxyMain.lambda$main$0(ProxyMain.java:122)\n\tat 
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
 java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: 
java.lang.ClassNotFoundException: Service class 
org.apache.tika.detect.zip.ZipContainerDetector is an interface\n\tat 
org.apache.tika.config.ServiceLoader.getServiceClass(ServiceLoader.java:215)\n\tat
 
org.apache.tika.config.TikaConfig$XmlLoader.loadOne(TikaConfig.java:659)\n\t... 
19 more\n",
    "ProjectId": 1,
    "DocumentId": 132,
    "Timestamp": "2021/12/10 03:40:17.458",
    "FragmentId": 153,
    "LogLevel": "Error",
    "Exception": "Unable to find a detector class: 
org.apache.tika.detect.zip.ZipContainerDetector",
    "WorkerIp": 12701001,
    "JobType": "Local Processing",
    "ProjectNormName": "test213_1",
    "UserId": 1,
    "ProjectDbName": "test213_1",
    "ThreadName": "pool-1-thread-2",
    "JobId": 32
}


"Message": "Failed parsing Tika config. 
Error:org.apache.tika.exception.TikaConfigException: Unable to find a detector 
class: org.apache.tika.detect.zip.ZipContainerDetector",

> Trying to upgrade from 1.27 to 2.1.0
> 
>
> Key: TIKA-3614
> URL: https://issues.apache.org/jira/browse/TIKA-3614
> Project: Tika
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>  Labels: gradle
>
> Currently, my application is using the Tika version of 1.27, in the Gradle 
> file we wrote like below to download and use Tika components.
> api(group: 'org.apache.tika', name: 'tika-parsers', version: '1.27')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> but when trying to update to 2.1.0 with the below code. seeing some of the 
> imports are missing.
> import org.apache.tika.config.TikaConfig;
> import org.apache.tika.detect.Detector;
> import org.apache.tika.exception.TikaException;
> import org.apache.tika.io.TikaInputStream;
> import org.apache.tika.metadata.HttpHeaders;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.metadata.TikaMetadataKeys;
> import org.apache.tika.mime.MediaType;
> import org.apache.tika.mime.MimeType;
> import org.apache.tika.mime.MimeTypeException;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> Tried with below, causing the above imports missing.
>  
> api(group: 'org.apache.tika', name: 'tika-parsers-standard-package', version: 
> '2.1.0')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         e

[jira] [Reopened] (TIKA-3614) Trying to upgrade from 1.27 to 2.1.0

2021-12-14 Thread Vamsi Molli (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vamsi Molli reopened TIKA-3614:
---

> Trying to upgrade from 1.27 to 2.1.0
> 
>
> Key: TIKA-3614
> URL: https://issues.apache.org/jira/browse/TIKA-3614
> Project: Tika
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>  Labels: gradle
>
> Currently, my application is using the Tika version of 1.27, in the Gradle 
> file we wrote like below to download and use Tika components.
> api(group: 'org.apache.tika', name: 'tika-parsers', version: '1.27')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> but when trying to update to 2.1.0 with the below code. seeing some of the 
> imports are missing.
> import org.apache.tika.config.TikaConfig;
> import org.apache.tika.detect.Detector;
> import org.apache.tika.exception.TikaException;
> import org.apache.tika.io.TikaInputStream;
> import org.apache.tika.metadata.HttpHeaders;
> import org.apache.tika.metadata.Metadata;
> import org.apache.tika.metadata.TikaMetadataKeys;
> import org.apache.tika.mime.MediaType;
> import org.apache.tika.mime.MimeType;
> import org.apache.tika.mime.MimeTypeException;
> import org.apache.tika.parser.AutoDetectParser;
> import org.apache.tika.parser.ParseContext;
> Tried with below, causing the above imports missing.
>  
> api(group: 'org.apache.tika', name: 'tika-parsers-standard-package', version: 
> '2.1.0')
> {         // Tika requires a version of jackson that conflicts with other 
> dependencies.         exclude group: "com.fasterxml.jackson.core"     }
> Please let me know what imports I need to change to fix above issues.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (TIKA-3615) Missing class file while upgrade to TIka 2.1.0

2021-12-14 Thread Vamsi Molli (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vamsi Molli closed TIKA-3615.
-
Resolution: Fixed

> Missing class file while upgrade to TIka 2.1.0
> --
>
> Key: TIKA-3615
> URL: https://issues.apache.org/jira/browse/TIKA-3615
> Project: Tika
>  Issue Type: Test
>  Components: detector, parser
>Affects Versions: 2.1.0
>Reporter: Vamsi Molli
>Priority: Major
>
> Class does not find an exception for following :
> 
> 
> 
> 
> 
> 
> 
> 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3619) Augment README with build prerequisites

2021-12-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459664#comment-17459664
 ] 

ASF GitHub Bot commented on TIKA-3619:
--

lewismc opened a new pull request #464:
URL: https://github.com/apache/tika/pull/464


   This trivial issue addresses https://issues.apache.org/jira/browse/TIKA-3619


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Augment README with build prerequisites
> ---
>
> Key: TIKA-3619
> URL: https://issues.apache.org/jira/browse/TIKA-3619
> Project: Tika
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.2.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Major
>
> When [reviewing the 2.2.0 RC 
> |https://lists.apache.org/thread/pfwm8sn7w3lsrsckd8b9v3b32byj4zms] I became 
> aware that although Docker IS required to build tika-pipes modules, there is 
> no guidance to reflect that.
> I think we could cleanup the README to reflect the installation prerequisites.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika] lewismc opened a new pull request #464: TIKA-3619 Augment README with build prerequisites

2021-12-14 Thread GitBox


lewismc opened a new pull request #464:
URL: https://github.com/apache/tika/pull/464


   This trivial issue addresses https://issues.apache.org/jira/browse/TIKA-3619


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-14 Thread Tilman Hausherr

Am 13.12.2021 um 19:05 schrieb Tim Allison:

All,
   I'm currently in the process of building the rc1 for Tika 2.x. On
TIKA-3616, Luís Filipe Nassif asked if we could upgrade log4j to
log4j2 in the 1.x branch.  I think we avoided that because it would be
a breaking change(?).  There are security vulns in log4j and it hit
EOL
in August 2015.
   Should we upgrade the Tika 1.x branch for log4j2?



Yes

Tilman




   Best,

Tim


[1] 
https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457595#comment-17457595





Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-14 Thread Konstantin Gribov
Hi, folks.

I'm +1 to both updating to log4j2 or logback and supporting security
updates for some time if we can but encourage migration to 2.2+ ASAP. Maybe
we should publish some EOL date in the 2.2.0 announcement if we didn't
before. It should give both time scope for migration and limit committers'
burden supporting 1.x with transparent EOL date.

Just my 2c

-- 
Best regards,
Konstantin Gribov.


On Wed, Dec 15, 2021 at 4:05 AM Luís Filipe Nassif 
wrote:

> Sorry about the additional work, Tim. I thought upgrading from log4j-1.x to
> 2.x on Tika-1.x maybe could not be that hard and didn't know about breaking
> changes.
>
> Related to Eric's email, would we support Tika-1.x security updates for
> some while (that was my intent with the proposal above)? Was this already
> discussed?
>
> Best regards,
> Luis Filipe
>
>
>
> Em seg., 13 de dez. de 2021 às 17:23, Tim Allison 
> escreveu:
>
> > Yes.  That was the reasoning behind my -0.  I don't think this will
> > destroy our resources, but yes, please do migrate to 2.x asap.
> >
> >
> > On Mon, Dec 13, 2021 at 3:13 PM Eric Pugh
> >  wrote:
> > >
> > > Isn’t the goal of Tika 2 to mean that we no longer work on Tika 1?
> >  Does the Tika community have enough developer bandwidth to continue to
> > maintain Tika 1 while also pushing forward on Tika 2?
> > >
> > > I worry that we’ll fall into that situation where people just end up
> > using Tika 1 for forever, especially if there are new updates to it that
> > are happening, which then encourages folks not to move to Tika 2.
> > >
> > >
> > >
> > >
> > > > On Dec 13, 2021, at 2:49 PM, Tim Allison 
> wrote:
> > > >
> > > > Sounds like 2 +1 to my -0. :D  I'll start working on this now.
> > > >
> > > > On Mon, Dec 13, 2021 at 2:09 PM Nicholas DiPiazza
> > > >  wrote:
> > > >>
> > > >> I prefer upgrade to log4j2
> > > >>
> > > >> On Mon, Dec 13, 2021, 12:05 PM Tim Allison 
> > wrote:
> > > >>
> > > >>> All,
> > > >>>  I'm currently in the process of building the rc1 for Tika 2.x. On
> > > >>> TIKA-3616, Luís Filipe Nassif asked if we could upgrade log4j to
> > > >>> log4j2 in the 1.x branch.  I think we avoided that because it would
> > be
> > > >>> a breaking change(?).  There are security vulns in log4j and it hit
> > > >>> EOL
> > > >>> in August 2015.
> > > >>>  Should we upgrade the Tika 1.x branch for log4j2?
> > > >>>
> > > >>>  Best,
> > > >>>
> > > >>>   Tim
> > > >>>
> > > >>>
> > > >>> [1]
> > > >>>
> >
> https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457595#comment-17457595
> > > >>>
> > >
> > > ___
> > > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467
> |
> > http://www.opensourceconnections.com <
> > http://www.opensourceconnections.com/> | My Free/Busy <
> > http://tinyurl.com/eric-cal>
> > > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> >
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> > >
> > > This e-mail and all contents, including attachments, is considered to
> be
> > Company Confidential unless explicitly stated otherwise, regardless of
> > whether attachments are marked as such.
> > >
> >
>


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Konstantin Gribov
Hi, folks.

Built successfully on ArchLinux, OpenJDK 11 & 17 (Temurin-11.0.13+8 &
17.0.1+12) w/ Tesseract 4.1.1, Leptonica 1.82.0 except:
*
org.apache.tika.parser.ocr.TesseractOCRParserTest.confirmMultiPageTiffHandling
(still extracts "Page?2" instead of "Page 2" on my laptop);
* bunch of potential CVEs reported in age-recognizer due to old Netty,
Hadoop, Avro, Mesos, Spark (web framework), Log4j 1.x, Jackson, Commons
BeanUtils, Scala, Commons Collections, Zookeeper, I'm not sure if any
affect Tika;
* some slf4j and log4j2 issues in tests (multiple bindings or absent
implementation).

I think we can ignore CVE-2021-45046 [1]
 now and update to log4j
2.16.0 in a few weeks, it has a much more narrow scope and we don't use
MDC/ThreadContext in a vulnerable way from what I see.

Checksums and GPG signatures seem fine.

[x] +1 Release this package as Apache Tika 2.2.0
[ ] -1 Do not release this package because...

[1]: https://www.cve.org/CVERecord?id=CVE-2021-45046

-- 
Best regards,
Konstantin Gribov.


On Wed, Dec 15, 2021 at 1:04 AM Oleg Tikhonov 
wrote:

> +1
>
> > On 15 Dec 2021, at 0:01, Tim Allison  wrote:
> >
> > +1
> >
> > On Tue, Dec 14, 2021 at 4:31 PM Lewis John McGibbney  >
> > wrote:
> >
> >> I'll submit a PR for the README but I think it's also worthwile to
> augment
> >> the release management guide so that the message to review the release
> >> candidate includes this information.
> >> lewismc
> >>
> >> On 2021/12/14 20:17:05 Tim Allison wrote:
> >>> Y, you're right. Lewis, where should we mention the Docker requirement
> >>> on our site?
> >>>
> >>> On Tue, Dec 14, 2021 at 3:06 PM Lewis John McGibbney <
> lewi...@apache.org>
> >> wrote:
> 
>  Hi Ken,
> 
>  On 2021/12/13 22:38:49 Ken Krugler wrote:
> > That error looks like you’ve got a connection issue with the Maven
> >> central repo…
> >
> > — Ken
> 
>  Yes you are correct :)
> 
>  Once that issue sorted itself out my local build passed so my +1
> >> stands.
> 
>  I this it is worthwhile us stating that Docker is a prerequisite for
> >> installing from source. This is required for the tika-pipes* modules.
> 
>  lewismc
> >>>
> >>
>
>


Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-14 Thread Luís Filipe Nassif
Sorry about the additional work, Tim. I thought upgrading from log4j-1.x to
2.x on Tika-1.x maybe could not be that hard and didn't know about breaking
changes.

Related to Eric's email, would we support Tika-1.x security updates for
some while (that was my intent with the proposal above)? Was this already
discussed?

Best regards,
Luis Filipe



Em seg., 13 de dez. de 2021 às 17:23, Tim Allison 
escreveu:

> Yes.  That was the reasoning behind my -0.  I don't think this will
> destroy our resources, but yes, please do migrate to 2.x asap.
>
>
> On Mon, Dec 13, 2021 at 3:13 PM Eric Pugh
>  wrote:
> >
> > Isn’t the goal of Tika 2 to mean that we no longer work on Tika 1?
>  Does the Tika community have enough developer bandwidth to continue to
> maintain Tika 1 while also pushing forward on Tika 2?
> >
> > I worry that we’ll fall into that situation where people just end up
> using Tika 1 for forever, especially if there are new updates to it that
> are happening, which then encourages folks not to move to Tika 2.
> >
> >
> >
> >
> > > On Dec 13, 2021, at 2:49 PM, Tim Allison  wrote:
> > >
> > > Sounds like 2 +1 to my -0. :D  I'll start working on this now.
> > >
> > > On Mon, Dec 13, 2021 at 2:09 PM Nicholas DiPiazza
> > >  wrote:
> > >>
> > >> I prefer upgrade to log4j2
> > >>
> > >> On Mon, Dec 13, 2021, 12:05 PM Tim Allison 
> wrote:
> > >>
> > >>> All,
> > >>>  I'm currently in the process of building the rc1 for Tika 2.x. On
> > >>> TIKA-3616, Luís Filipe Nassif asked if we could upgrade log4j to
> > >>> log4j2 in the 1.x branch.  I think we avoided that because it would
> be
> > >>> a breaking change(?).  There are security vulns in log4j and it hit
> > >>> EOL
> > >>> in August 2015.
> > >>>  Should we upgrade the Tika 1.x branch for log4j2?
> > >>>
> > >>>  Best,
> > >>>
> > >>>   Tim
> > >>>
> > >>>
> > >>> [1]
> > >>>
> https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17457595#comment-17457595
> > >>>
> >
> > ___
> > Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> > Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw
> >
> > This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
> >
>


[jira] [Created] (TIKA-3619) Augment README with build prerequisites

2021-12-14 Thread Lewis John McGibbney (Jira)
Lewis John McGibbney created TIKA-3619:
--

 Summary: Augment README with build prerequisites
 Key: TIKA-3619
 URL: https://issues.apache.org/jira/browse/TIKA-3619
 Project: Tika
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.2.0
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney


When [reviewing the 2.2.0 RC 
|https://lists.apache.org/thread/pfwm8sn7w3lsrsckd8b9v3b32byj4zms] I became 
aware that although Docker IS required to build tika-pipes modules, there is no 
guidance to reflect that.
I think we could cleanup the README to reflect the installation prerequisites.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Oleg Tikhonov
+1 

> On 15 Dec 2021, at 0:01, Tim Allison  wrote:
> 
> +1
> 
> On Tue, Dec 14, 2021 at 4:31 PM Lewis John McGibbney 
> wrote:
> 
>> I'll submit a PR for the README but I think it's also worthwile to augment
>> the release management guide so that the message to review the release
>> candidate includes this information.
>> lewismc
>> 
>> On 2021/12/14 20:17:05 Tim Allison wrote:
>>> Y, you're right. Lewis, where should we mention the Docker requirement
>>> on our site?
>>> 
>>> On Tue, Dec 14, 2021 at 3:06 PM Lewis John McGibbney 
>> wrote:
 
 Hi Ken,
 
 On 2021/12/13 22:38:49 Ken Krugler wrote:
> That error looks like you’ve got a connection issue with the Maven
>> central repo…
> 
> — Ken
 
 Yes you are correct :)
 
 Once that issue sorted itself out my local build passed so my +1
>> stands.
 
 I this it is worthwhile us stating that Docker is a prerequisite for
>> installing from source. This is required for the tika-pipes* modules.
 
 lewismc
>>> 
>> 



Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Tim Allison
+1

On Tue, Dec 14, 2021 at 4:31 PM Lewis John McGibbney 
wrote:

> I'll submit a PR for the README but I think it's also worthwile to augment
> the release management guide so that the message to review the release
> candidate includes this information.
> lewismc
>
> On 2021/12/14 20:17:05 Tim Allison wrote:
> > Y, you're right. Lewis, where should we mention the Docker requirement
> > on our site?
> >
> > On Tue, Dec 14, 2021 at 3:06 PM Lewis John McGibbney 
> wrote:
> > >
> > > Hi Ken,
> > >
> > > On 2021/12/13 22:38:49 Ken Krugler wrote:
> > > > That error looks like you’ve got a connection issue with the Maven
> central repo…
> > > >
> > > > — Ken
> > >
> > > Yes you are correct :)
> > >
> > > Once that issue sorted itself out my local build passed so my +1
> stands.
> > >
> > > I this it is worthwhile us stating that Docker is a prerequisite for
> installing from source. This is required for the tika-pipes* modules.
> > >
> > > lewismc
> >
>


[jira] [Commented] (TIKA-3618) Upgrade to log4j2 in 1.x branch

2021-12-14 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459506#comment-17459506
 ] 

Hudson commented on TIKA-3618:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #152 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/152/])
TIKA-3618 -- upgrade to log4j2 in the 1.x branch (tallison: 
[https://github.com/apache/tika/commit/705d06a5742b02cd5aea87e49b2b811274d440d4])
* (delete) tika-batch/src/test/resources/log4j_process.properties
* (edit) tika-parsers/pom.xml
* (edit) CHANGES.txt
* (delete) tika-server/src/test/resources/log4j.properties
* (add) tika-parsers/src/test/resources/log4j2.xml
* (edit) tika-eval/src/main/java/org/apache/tika/eval/io/XMLLogMsgHandler.java
* (add) tika-server/src/main/resources/log4j2.xml
* (delete) tika-server/src/main/resources/log4j.properties
* (edit) 
tika-server/src/main/java/org/apache/tika/server/metrics/MetricsHelper.java
* (edit) tika-langdetect/pom.xml
* (delete) tika-app/src/main/resources/log4j_batch_process.properties
* (add) tika-batch/src/test/resources/log4j2-on.properties
* (delete) 
tika-server/src/main/java/org/apache/tika/server/metrics/Log4JMetrics.java
* (edit) tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
* (add) tika-batch/src/test/resources/log4j2.xml
* (add) tika-app/src/main/resources/log4j2.xml
* (edit) 
tika-app/src/test/java/org/apache/tika/cli/TikaCLIBatchIntegrationTest.java
* (edit) 
tika-server/src/test/java/org/apache/tika/server/MetricsResourceTest.java
* (add) tika-batch/src/test/resources/log4j2_process.properties
* (edit) tika-core/pom.xml
* (delete) tika-bundle/src/test/resources/log4j.properties
* (edit) tika-dl/pom.xml
* (edit) tika-batch/pom.xml
* (edit) tika-fuzzing/pom.xml
* (edit) tika-app/pom.xml
* (edit) 
tika-app/src/test/java/org/apache/tika/cli/TikaCLIBatchCommandLineTest.java
* (edit) tika-batch/src/test/java/org/apache/tika/batch/fs/BatchProcessTest.java
* (add) tika-core/src/test/resources/log4j2.xml
* (add) tika-eval/src/main/resources/log4j2.xml
* (add) tika-app/src/main/resources/log4j2_batch_process.properties
* (delete) tika-batch/src/test/resources/log4j-on.properties
* (add) tika-bundle/src/test/resources/log4j2.xml
* (edit) tika-server/pom.xml
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
* (delete) tika-core/src/test/resources/log4j.properties
* (delete) tika-fuzzing/src/main/resources/log4j.properties
* (edit) 
tika-server/src/test/java/org/apache/tika/server/TikaServerIntegrationTest.java
* (add) tika-langdetect/src/test/resources/log4j2.xml
* (edit) tika-bundle/pom.xml
* (edit) tika-batch/src/test/java/org/apache/tika/batch/fs/BatchDriverTest.java
* (add) 
tika-server/src/main/java/org/apache/tika/server/metrics/Log4j2Metrics.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/XMLErrorLogUpdater.java
* (delete) tika-langdetect/src/test/resources/log4j.properties
* (delete) tika-app/src/test/resources/log4j_batch_process_test.properties
* (edit) tika-nlp/pom.xml
* (add) tika-fuzzing/src/main/resources/log4j2.xml
* (edit) tika-parent/pom.xml
* (delete) tika-batch/src/test/resources/log4j.properties
* (delete) tika-parsers/src/test/resources/log4j.properties
* (edit) tika-app/src/main/java/org/apache/tika/cli/BatchCommandLineBuilder.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* (add) tika-app/src/test/resources/log4j2_batch_process_test.properties
* (edit) tika-server/bin/tika.in.sh
* (edit) tika-batch/src/test/java/org/apache/tika/batch/fs/FSBatchTestBase.java
* (delete) tika-eval/src/main/resources/log4j.properties
* (delete) tika-app/src/main/resources/log4j.properties
* (add) tika-server/src/test/resources/log4j2.xml
* (edit) tika-eval/src/main/java/org/apache/tika/eval/io/XMLLogReader.java


> Upgrade to log4j2 in 1.x branch
> ---
>
> Key: TIKA-3618
> URL: https://issues.apache.org/jira/browse/TIKA-3618
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3585) General updates for 2.1.1

2021-12-14 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459507#comment-17459507
 ] 

Hudson commented on TIKA-3585:
--

UNSTABLE: Integrated in Jenkins build Tika » tika-branch1x-jdk8 #152 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-branch1x-jdk8/152/])
TIKA-3585 -- general upgrades (tallison: 
[https://github.com/apache/tika/commit/df6fdd7c0277a21c9a533634869b8ad9b4c303e4])
* (edit) tika-example/pom.xml
* (edit) tika-parent/pom.xml


> General updates for 2.1.1
> -
>
> Key: TIKA-3585
> URL: https://issues.apache.org/jira/browse/TIKA-3585
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Lewis John McGibbney
I'll submit a PR for the README but I think it's also worthwile to augment the 
release management guide so that the message to review the release candidate 
includes this information.
lewismc

On 2021/12/14 20:17:05 Tim Allison wrote:
> Y, you're right. Lewis, where should we mention the Docker requirement
> on our site?
> 
> On Tue, Dec 14, 2021 at 3:06 PM Lewis John McGibbney  
> wrote:
> >
> > Hi Ken,
> >
> > On 2021/12/13 22:38:49 Ken Krugler wrote:
> > > That error looks like you’ve got a connection issue with the Maven 
> > > central repo…
> > >
> > > — Ken
> >
> > Yes you are correct :)
> >
> > Once that issue sorted itself out my local build passed so my +1 stands.
> >
> > I this it is worthwhile us stating that Docker is a prerequisite for 
> > installing from source. This is required for the tika-pipes* modules.
> >
> > lewismc
> 


Re: release 1.28?

2021-12-14 Thread Tim Allison
As with 2.x, we can make a 1.28.1 release with the updated PDFBox
early in the new year.

On Tue, Dec 14, 2021 at 3:50 PM Tim Allison  wrote:
>
> All,
>   We upgraded to log4j 2.16.0 in the 1.x branch and upgraded a few
> other dependencies that ossindex flagged as vulnerable.  Given the
> breaking changes in migrating from log4j to log4j2, I've gone with the
> notion that the next 1.x release should be 1.28, not 1.27.1.
>   Once Subhajit has a chance to review the log4j2 mods around
> monitoring in tika server, should I roll a release candidate for 1.28?
>
>Best,
>
>  Tim


release 1.28?

2021-12-14 Thread Tim Allison
All,
  We upgraded to log4j 2.16.0 in the 1.x branch and upgraded a few
other dependencies that ossindex flagged as vulnerable.  Given the
breaking changes in migrating from log4j to log4j2, I've gone with the
notion that the next 1.x release should be 1.28, not 1.27.1.
  Once Subhajit has a chance to review the log4j2 mods around
monitoring in tika server, should I roll a release candidate for 1.28?

   Best,

 Tim


[jira] [Commented] (TIKA-3618) Upgrade to log4j2 in 1.x branch

2021-12-14 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459463#comment-17459463
 ] 

Tim Allison commented on TIKA-3618:
---

I just pushed the updates to the {{branch_1x}}.  Please edit as you see fit.

> Upgrade to log4j2 in 1.x branch
> ---
>
> Key: TIKA-3618
> URL: https://issues.apache.org/jira/browse/TIKA-3618
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Tim Allison
Y, you're right. Lewis, where should we mention the Docker requirement
on our site?

On Tue, Dec 14, 2021 at 3:06 PM Lewis John McGibbney  wrote:
>
> Hi Ken,
>
> On 2021/12/13 22:38:49 Ken Krugler wrote:
> > That error looks like you’ve got a connection issue with the Maven central 
> > repo…
> >
> > — Ken
>
> Yes you are correct :)
>
> Once that issue sorted itself out my local build passed so my +1 stands.
>
> I this it is worthwhile us stating that Docker is a prerequisite for 
> installing from source. This is required for the tika-pipes* modules.
>
> lewismc


[jira] [Commented] (TIKA-3618) Upgrade to log4j2 in 1.x branch

2021-12-14 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459452#comment-17459452
 ] 

Tim Allison commented on TIKA-3618:
---

Great!  As long as you're ok w the change! :D

> Upgrade to log4j2 in 1.x branch
> ---
>
> Key: TIKA-3618
> URL: https://issues.apache.org/jira/browse/TIKA-3618
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (TIKA-3618) Upgrade to log4j2 in 1.x branch

2021-12-14 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459434#comment-17459434
 ] 

Tim Allison edited comment on TIKA-3618 at 12/14/21, 8:06 PM:
--

This is going to be majorly breaking for the micrometer/prometheus/jmx 
component in tika-server cc [~subhajitdas298].

I'm relying mostly on this: 
https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/logging/Log4j2Metrics.java

I'm implementing the existing logic to take the combined filters' result as 
_the_ result.  Will push full update shortly to the {{branch_1x}} branch.


was (Author: talli...@mitre.org):
This is going to be majorly breaking for the micrometer/prometheus/jmx 
component in tika-server cc [~subhajitdas298].

I'm relying mostly on this: 
https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/logging/Log4j2Metrics.java

I'm implementing the existing logic to take the combined filters' result as 
_the_ result.  Will push full update shortly the the branch_1x branch.

> Upgrade to log4j2 in 1.x branch
> ---
>
> Key: TIKA-3618
> URL: https://issues.apache.org/jira/browse/TIKA-3618
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] Release Apache Tika 2.2.0 Candidate #1

2021-12-14 Thread Lewis John McGibbney
Hi Ken,

On 2021/12/13 22:38:49 Ken Krugler wrote:
> That error looks like you’ve got a connection issue with the Maven central 
> repo…
> 
> — Ken

Yes you are correct :)

Once that issue sorted itself out my local build passed so my +1 stands.

I this it is worthwhile us stating that Docker is a prerequisite for installing 
from source. This is required for the tika-pipes* modules.

lewismc


[jira] [Commented] (TIKA-3618) Upgrade to log4j2 in 1.x branch

2021-12-14 Thread Subhajit Das (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459445#comment-17459445
 ] 

Subhajit Das commented on TIKA-3618:


Hi [~tallison],

Yes the Log4j2 is supported out of box in micrometer, with the file you 
mentioned. 

I can take care of the change required to support Log4j2, once the changeover 
is closed.

> Upgrade to log4j2 in 1.x branch
> ---
>
> Key: TIKA-3618
> URL: https://issues.apache.org/jira/browse/TIKA-3618
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (TIKA-3618) Upgrade to log4j2 in 1.x branch

2021-12-14 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459434#comment-17459434
 ] 

Tim Allison edited comment on TIKA-3618 at 12/14/21, 7:34 PM:
--

This is going to be majorly breaking for the micrometer/prometheus/jmx 
component in tika-server cc [~subhajitdas298].

I'm relying mostly on this: 
https://github.com/micrometer-metrics/micrometer/blob/main/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/logging/Log4j2Metrics.java

I'm implementing the existing logic to take the combined filters' result as 
_the_ result.  Will push full update shortly the the branch_1x branch.


was (Author: talli...@mitre.org):
This is going to be majorly breaking for the micrometer/prometheus/jmx 
component in tika-server cc [~subhajitdas298].

> Upgrade to log4j2 in 1.x branch
> ---
>
> Key: TIKA-3618
> URL: https://issues.apache.org/jira/browse/TIKA-3618
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3618) Upgrade to log4j2 in 1.x branch

2021-12-14 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459434#comment-17459434
 ] 

Tim Allison commented on TIKA-3618:
---

This is going to be majorly breaking for the micrometer/prometheus/jmx 
component in tika-server cc [~subhajitdas298].

> Upgrade to log4j2 in 1.x branch
> ---
>
> Key: TIKA-3618
> URL: https://issues.apache.org/jira/browse/TIKA-3618
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: Log4j 2.16.0 a more complete fix to Log4Shell

2021-12-14 Thread Tim Allison
This is the issue solved by 2.16.0:
https://www.cve.org/CVERecord?id=CVE-2021-45046

I think that 2.15.0 is probably good enough for now.  We can upgrade
to 2.16.0 in 2.2.1, when we upgrade PDFBox and POI early in the new
year.

If anyone has a technical reason to think we should respin 2.2.0-rc1,
please vote/let us know.

Thank you, all!

Cheers,

 Tim

On Mon, Dec 13, 2021 at 7:59 PM Tim Allison  wrote:
>
> I'll dig deeper tomorrow, but I think we're ok with 2.15. I like what
> they've done with 2.16.0. :D
>
> On Mon, Dec 13, 2021 at 7:57 PM Dave Fisher  wrote:
> >
> > You’ll need to evaluate that yourself.
> >
> > Sent from my iPhone
> >
> > > On Dec 13, 2021, at 4:56 PM, Tim Allison  wrote:
> > >
> > > Do we have to do a respin of the release candidate or is this marginally 
> > > better?
> > >
> > >> On Mon, Dec 13, 2021 at 7:43 PM Dave Fisher  wrote:
> > >>
> > >> https://lists.apache.org/thread/d6v4r6nosxysyq9rvnr779336yf0woz4
> >


[jira] [Commented] (TIKA-3417) Running tika-docker as non-root user

2021-12-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459284#comment-17459284
 ] 

ASF GitHub Bot commented on TIKA-3417:
--

wjwilson-ibm commented on pull request #4:
URL: https://github.com/apache/tika-docker/pull/4#issuecomment-993718392


   Hi @dameikle I can confirm the image logicalspark/docker-tikaserver:1.27 
changed yesterday.  Thank you.  Unfortunately, shelling into the running 
container shows user root and the server running as root.  This is same as 
before.  See details.
   
   ```
   $ docker pull logicalspark/docker-tikaserver:1.27
   1.27: Pulling from logicalspark/docker-tikaserver
   7b1a6ab2e44d: Pull complete 
   778c5d10e7f8: Pull complete 
   a47716074ba4: Pull complete 
   3616900ae8b7: Pull complete 
   2e227631943d: Pull complete 
   bc6f84d523ee: Pull complete 
   Digest: 
sha256:73149cc5c9f5376ade4a19d2fbf3f4d8c8ec7d219f06300cc9148df4ccc1277e
   Status: Downloaded newer image for logicalspark/docker-tikaserver:1.27
   docker.io/logicalspark/docker-tikaserver:1.27
   
   $ docker run -ti logicalspark/docker-tikaserver bash
   Dec 14, 2021 4:21:29 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   Dec 14, 2021 4:21:29 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: Tesseract OCR is installed and will be automatically applied to 
image files unless
   you've excluded the TesseractOCRParser from the default parser.
   Tesseract may dramatically slow down content extraction (TIKA-2359).
   As of Tika 1.15 (and prior versions), Tesseract is automatically called.
   In future versions of Tika, users may need to turn the TesseractOCRParser on 
via TikaConfig.
   Dec 14, 2021 4:21:29 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: org.xerial's sqlite-jdbc is not loaded.
   Please provide the jar on your classpath to parse sqlite files.
   See tika-parsers/pom.xml for the correct version.
   INFO  Starting Apache Tika 1.27 server
   INFO  Setting the server's publish address to be http://0.0.0.0:9998/
   INFO  Logging initialized @1778ms to org.eclipse.jetty.util.log.Slf4jLog
   INFO  jetty-9.4.41.v20210516; built: 2021-05-16T23:56:28.993Z; git: 
98607f93c7833e7dc59489b13f3cb0a114fb9f4c; jvm 14.0.2+12-Ubuntu-120.04
   INFO  Started ServerConnector@64beebb7{HTTP/1.1, (http/1.1)}{0.0.0.0:9998}
   INFO  Started @1896ms
   WARN  Empty contextPath
   INFO  Started o.e.j.s.h.ContextHandler@1922e6d{/,null,AVAILABLE}
   INFO  Started Apache Tika server at http://0.0.0.0:9998/
   
   $ docker ps
   CONTAINER ID   IMAGECOMMAND  
CREATED  STATUS  PORTS  NAMES
   b6f4db72897a   logicalspark/docker-tikaserver   "/bin/sh -c 'exec ja…"   18 
seconds ago   Up 18 seconds   9998/tcp   eager_kilby
   
   $ docker exec -ti b6f4db72897a bash
   root@b6f4db72897a:/# whoami
   root
   
   root@b6f4db72897a:/# ps -ef
   UIDPID  PPID  C STIME TTY  TIME CMD
   root 1 0  3 16:21 pts/000:00:03 java -jar 
/tika-server-1.27.jar -h 0.0.0.0 bash
   root45 0  0 16:21 pts/100:00:00 bash
   root5545  0 16:23 pts/100:00:00 ps -ef
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Running tika-docker as non-root user
> 
>
> Key: TIKA-3417
> URL: https://issues.apache.org/jira/browse/TIKA-3417
> Project: Tika
>  Issue Type: Improvement
>  Components: docker, tika-docker
>Reporter: Lewis John McGibbney
>Assignee: Philip Southam
>Priority: Major
>
> The PR and context can be found at 
> https://github.com/apache/tika-docker/pull/4



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [tika-docker] wjwilson-ibm commented on pull request #4: [TIKA-3417] Running tika-docker as non-root user

2021-12-14 Thread GitBox


wjwilson-ibm commented on pull request #4:
URL: https://github.com/apache/tika-docker/pull/4#issuecomment-993718392


   Hi @dameikle I can confirm the image logicalspark/docker-tikaserver:1.27 
changed yesterday.  Thank you.  Unfortunately, shelling into the running 
container shows user root and the server running as root.  This is same as 
before.  See details.
   
   ```
   $ docker pull logicalspark/docker-tikaserver:1.27
   1.27: Pulling from logicalspark/docker-tikaserver
   7b1a6ab2e44d: Pull complete 
   778c5d10e7f8: Pull complete 
   a47716074ba4: Pull complete 
   3616900ae8b7: Pull complete 
   2e227631943d: Pull complete 
   bc6f84d523ee: Pull complete 
   Digest: 
sha256:73149cc5c9f5376ade4a19d2fbf3f4d8c8ec7d219f06300cc9148df4ccc1277e
   Status: Downloaded newer image for logicalspark/docker-tikaserver:1.27
   docker.io/logicalspark/docker-tikaserver:1.27
   
   $ docker run -ti logicalspark/docker-tikaserver bash
   Dec 14, 2021 4:21:29 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
   See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
   for optional dependencies.
   Dec 14, 2021 4:21:29 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: Tesseract OCR is installed and will be automatically applied to 
image files unless
   you've excluded the TesseractOCRParser from the default parser.
   Tesseract may dramatically slow down content extraction (TIKA-2359).
   As of Tika 1.15 (and prior versions), Tesseract is automatically called.
   In future versions of Tika, users may need to turn the TesseractOCRParser on 
via TikaConfig.
   Dec 14, 2021 4:21:29 PM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
   WARNING: org.xerial's sqlite-jdbc is not loaded.
   Please provide the jar on your classpath to parse sqlite files.
   See tika-parsers/pom.xml for the correct version.
   INFO  Starting Apache Tika 1.27 server
   INFO  Setting the server's publish address to be http://0.0.0.0:9998/
   INFO  Logging initialized @1778ms to org.eclipse.jetty.util.log.Slf4jLog
   INFO  jetty-9.4.41.v20210516; built: 2021-05-16T23:56:28.993Z; git: 
98607f93c7833e7dc59489b13f3cb0a114fb9f4c; jvm 14.0.2+12-Ubuntu-120.04
   INFO  Started ServerConnector@64beebb7{HTTP/1.1, (http/1.1)}{0.0.0.0:9998}
   INFO  Started @1896ms
   WARN  Empty contextPath
   INFO  Started o.e.j.s.h.ContextHandler@1922e6d{/,null,AVAILABLE}
   INFO  Started Apache Tika server at http://0.0.0.0:9998/
   
   $ docker ps
   CONTAINER ID   IMAGECOMMAND  
CREATED  STATUS  PORTS  NAMES
   b6f4db72897a   logicalspark/docker-tikaserver   "/bin/sh -c 'exec ja…"   18 
seconds ago   Up 18 seconds   9998/tcp   eager_kilby
   
   $ docker exec -ti b6f4db72897a bash
   root@b6f4db72897a:/# whoami
   root
   
   root@b6f4db72897a:/# ps -ef
   UIDPID  PPID  C STIME TTY  TIME CMD
   root 1 0  3 16:21 pts/000:00:03 java -jar 
/tika-server-1.27.jar -h 0.0.0.0 bash
   root45 0  0 16:21 pts/100:00:00 bash
   root5545  0 16:23 pts/100:00:00 ps -ef
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org