[GitHub] [jackrabbit-oak] Ewocker commented on a diff in pull request #561: OAK-9758 error out if tika dependencies are missing and improve loggi…

2022-05-11 Thread GitBox


Ewocker commented on code in PR #561:
URL: https://github.com/apache/jackrabbit-oak/pull/561#discussion_r870487942


##
oak-run/src/main/java/org/apache/jackrabbit/oak/plugins/tika/TextExtractor.java:
##
@@ -270,22 +272,24 @@ public InputStream get() {
 // not being present. This is equivalent to disabling
 // selected media types in configuration, so we can simply
 // ignore these errors.
-log.debug("Failed to extract text from a binary property: {}."
+String format = "Failed to extract text from a binary property: 
{}."
 + " This often happens when some media types are 
disabled by configuration."
-+ " The stack trace is included to flag some 
'unintended' failures",
-path, e);
++ " The stack trace is included to flag some 
'unintended' failures";
+log.warn(format, linkageErrorFound ? path : new Object[]{path, e});
+linkageErrorFound = true;
 parserErrorCount.incrementAndGet();
 return ERROR_TEXT;
 } catch (Throwable t) {
 // Capture and report any other full text extraction problems.
 // The special STOP exception is used for normal termination.
 if (!handler.isWriteLimitReached(t)) {
 parserErrorCount.incrementAndGet();
-parserError.debug("Failed to extract text from a binary 
property: "
-+ path
+String format = "Failed to extract text from a binary 
property: {}"
 + " This is a fairly common case, and nothing to"
 + " worry about. The stack trace is included to"
-+ " help improve the text extraction feature.", t);
++ " help improve the text extraction feature.";
+parserError.warn(format, throwableErrorFound ? path : new 
Object[]{path, t});

Review Comment:
   done



##
oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/binary/FulltextBinaryTextExtractor.java:
##
@@ -193,13 +196,14 @@ public Void call() throws Exception {
   // Capture and report any other full text extraction problems.
   // The special STOP exception is used for normal termination.
   if (!handler.isWriteLimitReached(t)) {
-log.debug(
-"[{}] Failed to extract text from a binary property: {}."
+String format = "[{}] Failed to extract text from a binary property: 
{}."
 + " This is a fairly common case, and nothing to"
 + " worry about. The stack trace is included to"
-+ " help improve the text extraction feature.",
-getIndexName(), path, t);
++ " help improve the text extraction feature.";
+String indexName = getIndexName();
+log.warn(format, throwableErrorFound ? new Object[]{indexName, path} : 
new Object[]{indexName, path, t});

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [jackrabbit-oak] Ewocker commented on a diff in pull request #561: OAK-9758 error out if tika dependencies are missing and improve loggi…

2022-05-06 Thread GitBox


Ewocker commented on code in PR #561:
URL: https://github.com/apache/jackrabbit-oak/pull/561#discussion_r867283602


##
oak-run/src/main/java/org/apache/jackrabbit/oak/plugins/tika/TextExtractor.java:
##
@@ -270,22 +272,22 @@ public InputStream get() {
 // not being present. This is equivalent to disabling
 // selected media types in configuration, so we can simply
 // ignore these errors.
-log.debug("Failed to extract text from a binary property: {}."
+String format = "Failed to extract text from a binary property: 
{}."
 + " This often happens when some media types are 
disabled by configuration."
-+ " The stack trace is included to flag some 
'unintended' failures",
-path, e);
++ " The stack trace is included to flag some 
'unintended' failures";
+log.warn(format, linkageErrorFound ? path : new Object[]{path, e});
 parserErrorCount.incrementAndGet();
 return ERROR_TEXT;
 } catch (Throwable t) {
 // Capture and report any other full text extraction problems.
 // The special STOP exception is used for normal termination.
 if (!handler.isWriteLimitReached(t)) {
 parserErrorCount.incrementAndGet();
-parserError.debug("Failed to extract text from a binary 
property: "
-+ path
+String format = "Failed to extract text from a binary 
property: {}"
 + " This is a fairly common case, and nothing to"
 + " worry about. The stack trace is included to"
-+ " help improve the text extraction feature.", t);
++ " help improve the text extraction feature.";
+parserError.warn(format, throwableErrorFound ? path : new 
Object[]{path, t});

Review Comment:
   Do we still want to update it to info?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [jackrabbit-oak] Ewocker commented on a diff in pull request #561: OAK-9758 error out if tika dependencies are missing and improve loggi…

2022-05-06 Thread GitBox


Ewocker commented on code in PR #561:
URL: https://github.com/apache/jackrabbit-oak/pull/561#discussion_r867283424


##
oak-run/src/main/java/org/apache/jackrabbit/oak/plugins/tika/TextExtractor.java:
##
@@ -270,22 +272,22 @@ public InputStream get() {
 // not being present. This is equivalent to disabling
 // selected media types in configuration, so we can simply
 // ignore these errors.
-log.debug("Failed to extract text from a binary property: {}."
+String format = "Failed to extract text from a binary property: 
{}."
 + " This often happens when some media types are 
disabled by configuration."
-+ " The stack trace is included to flag some 
'unintended' failures",
-path, e);
++ " The stack trace is included to flag some 
'unintended' failures";
+log.warn(format, linkageErrorFound ? path : new Object[]{path, e});
 parserErrorCount.incrementAndGet();
 return ERROR_TEXT;
 } catch (Throwable t) {
 // Capture and report any other full text extraction problems.
 // The special STOP exception is used for normal termination.
 if (!handler.isWriteLimitReached(t)) {
 parserErrorCount.incrementAndGet();
-parserError.debug("Failed to extract text from a binary 
property: "
-+ path
+String format = "Failed to extract text from a binary 
property: {}"
 + " This is a fairly common case, and nothing to"
 + " worry about. The stack trace is included to"
-+ " help improve the text extraction feature.", t);
++ " help improve the text extraction feature.";
+parserError.warn(format, throwableErrorFound ? path : new 
Object[]{path, t});

Review Comment:
   Correct, I believe the error is for any sort of failure here... so not 
necessary just jar missing.
   And no, we do not have any test case for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [jackrabbit-oak] Ewocker commented on a diff in pull request #561: OAK-9758 error out if tika dependencies are missing and improve loggi…

2022-05-06 Thread GitBox


Ewocker commented on code in PR #561:
URL: https://github.com/apache/jackrabbit-oak/pull/561#discussion_r867235841


##
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/run/cli/CommonOptions.java:
##
@@ -36,13 +36,15 @@ public class CommonOptions implements OptionsBean {
 private final OptionSpec nonOption;
 private final OptionSpec metrics;
 private final OptionSpec segment;
+private final OptionSpec continueMissingDep;
 private OptionSet options;
 
 public CommonOptions(OptionParser parser){
 help = parser.acceptsAll(asList("h", "?", "help"), "Show 
help").forHelp();
 readWriteOption = parser.accepts("read-write", "Connect to repository 
in read-write mode");
 metrics = parser.accepts("metrics", "Enables metrics based statistics 
collection");
 segment = parser.accepts("segment", "Use older oak-segment support");
+continueMissingDep = parser.accepts("continue-missing-tika-dep", 
"Continue to run when there are missing dependency");

Review Comment:
   changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [jackrabbit-oak] Ewocker commented on a diff in pull request #561: OAK-9758 error out if tika dependencies are missing and improve loggi…

2022-05-06 Thread GitBox


Ewocker commented on code in PR #561:
URL: https://github.com/apache/jackrabbit-oak/pull/561#discussion_r867233951


##
oak-run/src/main/java/org/apache/jackrabbit/oak/plugins/tika/TextExtractor.java:
##
@@ -270,22 +272,22 @@ public InputStream get() {
 // not being present. This is equivalent to disabling
 // selected media types in configuration, so we can simply
 // ignore these errors.
-log.debug("Failed to extract text from a binary property: {}."
+String format = "Failed to extract text from a binary property: 
{}."
 + " This often happens when some media types are 
disabled by configuration."
-+ " The stack trace is included to flag some 
'unintended' failures",
-path, e);
++ " The stack trace is included to flag some 
'unintended' failures";
+log.warn(format, linkageErrorFound ? path : new Object[]{path, e});
 parserErrorCount.incrementAndGet();
 return ERROR_TEXT;
 } catch (Throwable t) {
 // Capture and report any other full text extraction problems.
 // The special STOP exception is used for normal termination.
 if (!handler.isWriteLimitReached(t)) {
 parserErrorCount.incrementAndGet();
-parserError.debug("Failed to extract text from a binary 
property: "
-+ path
+String format = "Failed to extract text from a binary 
property: {}"
 + " This is a fairly common case, and nothing to"
 + " worry about. The stack trace is included to"
-+ " help improve the text extraction feature.", t);
++ " help improve the text extraction feature.";
+parserError.warn(format, throwableErrorFound ? path : new 
Object[]{path, t});
 return ERROR_TEXT;

Review Comment:
   done



##
oak-run/src/main/java/org/apache/jackrabbit/oak/plugins/tika/TextExtractor.java:
##
@@ -270,22 +272,22 @@ public InputStream get() {
 // not being present. This is equivalent to disabling
 // selected media types in configuration, so we can simply
 // ignore these errors.
-log.debug("Failed to extract text from a binary property: {}."
+String format = "Failed to extract text from a binary property: 
{}."
 + " This often happens when some media types are 
disabled by configuration."
-+ " The stack trace is included to flag some 
'unintended' failures",
-path, e);
++ " The stack trace is included to flag some 
'unintended' failures";
+log.warn(format, linkageErrorFound ? path : new Object[]{path, e});
 parserErrorCount.incrementAndGet();

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [jackrabbit-oak] Ewocker commented on a diff in pull request #561: OAK-9758 error out if tika dependencies are missing and improve loggi…

2022-05-06 Thread GitBox


Ewocker commented on code in PR #561:
URL: https://github.com/apache/jackrabbit-oak/pull/561#discussion_r867233642


##
oak-run/src/main/java/org/apache/jackrabbit/oak/plugins/tika/TextExtractor.java:
##
@@ -79,6 +79,8 @@ class TextExtractor implements Closeable {
 private boolean initialized;
 private BinaryStats stats;
 private boolean closed;
+private boolean linkageErrorFound = false;

Review Comment:
   done



##
oak-search/src/main/java/org/apache/jackrabbit/oak/plugins/index/search/spi/binary/FulltextBinaryTextExtractor.java:
##
@@ -74,6 +74,8 @@ public class FulltextBinaryTextExtractor {
   private final boolean reindex;
   private Parser parser;
   private TikaConfigHolder tikaConfig;
+  private boolean linkageErrorFound = false;

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [jackrabbit-oak] Ewocker commented on a diff in pull request #561: OAK-9758 error out if tika dependencies are missing and improve loggi…

2022-05-06 Thread GitBox


Ewocker commented on code in PR #561:
URL: https://github.com/apache/jackrabbit-oak/pull/561#discussion_r867232958


##
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/run/cli/CommonOptions.java:
##
@@ -55,6 +57,8 @@ public boolean isHelpRequested(){
 return options.has(help);
 }
 
+public boolean isContinueMissingDep() { return 
options.has(continueMissingDep); }

Review Comment:
   done



##
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/run/cli/Options.java:
##
@@ -80,6 +80,15 @@ public OptionSet parseAndConfigure(OptionParser parser, 
String[] args, boolean c
 if (checkNonOptions) {
 checkNonOptions();
 }
+CommonOptions commonOpts = getCommonOpts();
+if (!commonOpts.isHelpRequested() && 
!commonOpts.isContinueMissingDep()) {

Review Comment:
   moved



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [jackrabbit-oak] Ewocker commented on a diff in pull request #561: OAK-9758 error out if tika dependencies are missing and improve loggi…

2022-05-05 Thread GitBox


Ewocker commented on code in PR #561:
URL: https://github.com/apache/jackrabbit-oak/pull/561#discussion_r866419970


##
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/run/cli/CommonOptions.java:
##
@@ -36,13 +36,15 @@ public class CommonOptions implements OptionsBean {
 private final OptionSpec nonOption;
 private final OptionSpec metrics;
 private final OptionSpec segment;
+private final OptionSpec continueMissingDep;
 private OptionSet options;
 
 public CommonOptions(OptionParser parser){
 help = parser.acceptsAll(asList("h", "?", "help"), "Show 
help").forHelp();
 readWriteOption = parser.accepts("read-write", "Connect to repository 
in read-write mode");
 metrics = parser.accepts("metrics", "Enables metrics based statistics 
collection");
 segment = parser.accepts("segment", "Use older oak-segment support");
+continueMissingDep = parser.accepts("continue-missing-dep", "Continue 
to run when there are missing dependency");

Review Comment:
   updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org