[jira] [Commented] (TIKA-3721) DGN parser

2022-04-23 Thread Dan Coldrick (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526803#comment-17526803
 ] 

Dan Coldrick commented on TIKA-3721:


Is this along the right lines?
{code:java}
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.tika.parser.dgn;import java.io.IOException;
import java.io.InputStream;
import java.util.Collections;
import java.util.Set;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.input.CloseShieldInputStream;
import org.apache.poi.poifs.filesystem.DirectoryNode;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;
import org.apache.tika.exception.TikaException;
import org.apache.tika.io.TikaInputStream;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MediaType;
import org.apache.tika.parser.AbstractParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.microsoft.SummaryExtractor;
import org.apache.tika.sax.XHTMLContentHandler;/**
 * DGN (CAD Drawing) parser. This is a very basic parser, which just looks for
 * bits of the headers.
 */
public class DGNParser extends AbstractParser {    /**
     * 
     */
    private static final long serialVersionUID = 311571157668507304L;
    private static MediaType TYPE = MediaType.image("vnd.dgn");
    public Set getSupportedTypes(ParseContext context) {
        return Collections.singleton(TYPE);
    }
    public void parse(InputStream stream, ContentHandler handler, Metadata 
metadata, ParseContext context)
            throws IOException, SAXException, TikaException {        
XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);
        xhtml.startDocument();
        SummaryExtractor summaryExtractor = new SummaryExtractor(metadata);
        final DirectoryNode root;
        TikaInputStream tstream = TikaInputStream.cast(stream);
        POIFSFileSystem mustCloseFs = null;
        try {
            if (tstream == null) {
                mustCloseFs = new POIFSFileSystem(new 
CloseShieldInputStream(stream));
                root = mustCloseFs.getRoot();
            } else {
                final Object container = tstream.getOpenContainer();
                if (container instanceof POIFSFileSystem) {
                    root = ((POIFSFileSystem) container).getRoot();
                } else if (container instanceof DirectoryNode) {
                    root = (DirectoryNode) container;
                } else {
                    POIFSFileSystem fs = null;
                    if (tstream.hasFile()) {
                        fs = new POIFSFileSystem(tstream.getFile(), true);
                    } else {
                        fs = new POIFSFileSystem(new 
CloseShieldInputStream(tstream));
                    }
                    // tstream will close the fs, no need to close this below
                    tstream.setOpenContainer(fs);
                    root = fs.getRoot();                }
            }            summaryExtractor.parseSummaries(root);        } 
finally {
            IOUtils.closeQuietly(mustCloseFs);
        }
        xhtml.endDocument();
    }
}
  {code}
I know I'm not handling v7's yet but it does appear to output v8's meta data at 
least? If we have it in it's own parser there is the option to extend for V7's? 
Again I'm not really a proper java developer and can just hack my way around to 
get stuff working so any feedback would be good?

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: Screenshot from 2022-04-22 16-03-44.png, 
> dgn8s-dumped.txt, image-2022-04-22-20-00-45-704.png, 
> image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://doc

[jira] [Commented] (TIKA-3721) DGN parser

2022-04-23 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526809#comment-17526809
 ] 

Tim Allison commented on TIKA-3721:
---

>{{SummaryExtractor}} already supports custom properties with the 
>{{Office.USER_DEFINED_METADATA_NAME_PREFIX}} 

Sorry, of course, [~nick]. Thank you!

> DGN parser
> --
>
> Key: TIKA-3721
> URL: https://issues.apache.org/jira/browse/TIKA-3721
> Project: Tika
>  Issue Type: New Feature
>  Components: parser
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Attachments: Screenshot from 2022-04-22 16-03-44.png, 
> dgn8s-dumped.txt, image-2022-04-22-20-00-45-704.png, 
> image-2022-04-22-20-01-09-564.png, image-2022-04-22-20-02-24-180.png
>
>
> Does anyone have any experience with the DGN file format by MicroStation? I 
> see TIKA doesn't have a parser so would it be possible to create one? 
> https://docs.fileformat.com/cad/dgn/



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


tika-main windows build fails in TikaResourceFetcherTest

2022-04-23 Thread Tilman Hausherr
I have unsuccessfully tried to build tika-main on windows 10 on jdk8 for 
several weeks. Here's the failures I get


[ERROR] Failures:
[ERROR] 
TikaResourceFetcherTest.testHeader:100->CXFTestBase.assertContains:65 
hello world not found in:

 ==> expected:  but was: 
[ERROR] 
TikaResourceFetcherTest.testQueryPart:108->CXFTestBase.assertContains:65 
hello world not found in:

 ==> expected:  but was: 
[ERROR] Errors:
[ERROR] TikaServerIntegrationTest.test1WayTLS:341->configure1WayTLS:456 
» InvalidPath Illegal char <"> at index 0: 
"XXX\tika-main\tika-server\tika-server-core\target\test-classes\ssl-keys\tika-client-truststore.p12"
[ERROR] TikaServerIntegrationTest.test2WayTLS:377->configure2WayTLS:428 
» InvalidPath Illegal char <"> at index 0: 
"XXX\tika-main\tika-server\tika-server-core\target\test-classes\ssl-keys\tika-client-keystore.p12"

[INFO]
[ERROR] Tests run: 70, Failures: 2, Errors: 2, Skipped: 4

The two TLS fails are new, but the TikaResourceFetcherTest have been for 
weeks. The reason is that response.getEntity() returns an empty string. 
This is because response.getEntity() is a ByteArrayInputStream that is 
empty.



One output is this:


INFO  [main] 14:23:17,531 
org.apache.tika.pipes.fetcher.fs.FileSystemFetcher A FileSystemFetcher 
(fsf) has been initialized. Clients will be able to read all files under 
'XXX\tika-main\tika-server\tika-server-core\XXXtika-maintika-servertika-server-coretargettest-classestest-documents' 
if this process has permission to read them.


Note that the two XXX here are the same. It's the Window path where I 
keep my java projects.


I investigated a bit... FetcherManager.load loads a file from the temp 
directory. Its content is like this:




... license...


  
    
  
    fsf
XXXtika-maintika-servertika-server-coretargettest-classestest-documents
  
    
  

...

Something goes wrong in

    configXML = configXML.replaceAll("\\$\\{FETCHER_BASE_PATH\\}",
    inputDir.toAbsolutePath().toString());

in TikaResourceFetcherTest.java that the backslash from the path is lost.

The javadoc warns about this

    Note that backslashes (|\|) and dollar signs (|$|) in the 
replacement string may cause the results to be different than if it were 
being treated as a literal replacement string


using replace("${FETCHER_BASE_PATH}") fixes this.

Related: shouldn't FileSystemFetcher.checkInitialization() check whether 
the path exists?


Tilman


[jira] [Commented] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526870#comment-17526870
 ] 

Tilman Hausherr commented on TIKA-3719:
---

Test fails on windows because of the added double quotes
{noformat}
java.nio.file.InvalidPathException: Illegal char <"> at index 0: 
"XXX\tika-main\tika-server\tika-server-core\target\test-classes\ssl-keys\tika-client-truststore.p12"
at sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182)
at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153)
at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77)
at sun.nio.fs.WindowsPath.parse(WindowsPath.java:94)
at sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:255)
at java.nio.file.Paths.get(Paths.java:84)
at 
org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils.getKeyStore(TLSParameterJaxBUtils.java:152)
at 
org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils.getTrustManagers(TLSParameterJaxBUtils.java:395)
at 
org.apache.tika.server.core.TikaServerIntegrationTest.configure1WayTLS(TikaServerIntegrationTest.java:457)
at 
org.apache.tika.server.core.TikaServerIntegrationTest.test1WayTLS(TikaServerIntegrationTest.java:341)
{noformat}


> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: tika-main windows build fails in TikaResourceFetcherTest

2022-04-23 Thread Tilman Hausherr
Same problem also in TikaServerIntegrationTest with replaceAll(); plus 
the problem that I mentioned in TIKA-3719.


[jira] [Comment Edited] (TIKA-3719) Tika Server Ability to Run HTTPs

2022-04-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526870#comment-17526870
 ] 

Tilman Hausherr edited comment on TIKA-3719 at 4/23/22 3:30 PM:


Test fails on windows because of the added double quotes
{noformat}
java.nio.file.InvalidPathException: Illegal char <"> at index 0: 
"XXX\tika-main\tika-server\tika-server-core\target\test-classes\ssl-keys\tika-client-truststore.p12"
at sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182)
at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153)
at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77)
at sun.nio.fs.WindowsPath.parse(WindowsPath.java:94)
at sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:255)
at java.nio.file.Paths.get(Paths.java:84)
at 
org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils.getKeyStore(TLSParameterJaxBUtils.java:152)
at 
org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils.getTrustManagers(TLSParameterJaxBUtils.java:395)
at 
org.apache.tika.server.core.TikaServerIntegrationTest.configure1WayTLS(TikaServerIntegrationTest.java:457)
at 
org.apache.tika.server.core.TikaServerIntegrationTest.test1WayTLS(TikaServerIntegrationTest.java:341)
{noformat}

If I correct getSSL() so that it doesn't quote the filename, and the 
replaceAll() bugs I mentioned in the dev mailing list, one test still fails:

org.opentest4j.AssertionFailedError: 
readHandshakeRecord not found in:
java.net.SocketException: SocketException invoking 
https://localhost:/rmeta: Software caused connection abort: recv failed ==> 
expected:  but was: 

I don't know if this is a real test failure or if the jdk just has a different 
error messages in Windows.


was (Author: tilman):
Test fails on windows because of the added double quotes
{noformat}
java.nio.file.InvalidPathException: Illegal char <"> at index 0: 
"XXX\tika-main\tika-server\tika-server-core\target\test-classes\ssl-keys\tika-client-truststore.p12"
at sun.nio.fs.WindowsPathParser.normalize(WindowsPathParser.java:182)
at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:153)
at sun.nio.fs.WindowsPathParser.parse(WindowsPathParser.java:77)
at sun.nio.fs.WindowsPath.parse(WindowsPath.java:94)
at sun.nio.fs.WindowsFileSystem.getPath(WindowsFileSystem.java:255)
at java.nio.file.Paths.get(Paths.java:84)
at 
org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils.getKeyStore(TLSParameterJaxBUtils.java:152)
at 
org.apache.cxf.configuration.jsse.TLSParameterJaxBUtils.getTrustManagers(TLSParameterJaxBUtils.java:395)
at 
org.apache.tika.server.core.TikaServerIntegrationTest.configure1WayTLS(TikaServerIntegrationTest.java:457)
at 
org.apache.tika.server.core.TikaServerIntegrationTest.test1WayTLS(TikaServerIntegrationTest.java:341)
{noformat}


> Tika Server Ability to Run HTTPs
> 
>
> Key: TIKA-3719
> URL: https://issues.apache.org/jira/browse/TIKA-3719
> Project: Tika
>  Issue Type: Wish
>  Components: tika-server
>Affects Versions: 2.3.0
>Reporter: Dan Coldrick
>Priority: Minor
> Fix For: 2.4.0
>
> Attachments: image-2022-04-21-18-52-50-706.png, localhost.jks
>
>
> We need the ability to run TIKA server as a https end point, I can't see 
> anything in the config that allows for this. 
> Looks like I'm not the only one:
> [https://stackoverflow.com/questions/7031/apache-tika-convert-apache-tika-server-rest-endpointsjax-rs-http-to-https]
>  
> If anyone can point to some documentation on how it might be possible it 
> would be really appreciated.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: Code review on use of cxf in Apache Tika?

2022-04-23 Thread Sergey Beryozkin
Hi Tim

Apologies I'm totally occupied with Quarkus right now, I'm sorry it
consumes all the time.
Andriy, if you could help the Tika colleagues then it would be great, as
you've helped with integrating Tika in Apache CXF as well, recall how we
enjoyed the presentation about Tika at one of ASF Conferences :-).

Cheers, Sergey

On Thu, Apr 21, 2022 at 10:55 PM Tim Allison  wrote:

> Friends and colleagues,
>
>   Over on Apache Tika, our server has been using cxf for a long time.
> We've been very happy with its capabilities and robustness.  So, thank
> you!
>
>   Recently we were asked to add TLS, and we managed to do so
> programmatically[0]. The requestor on that issue noted that it would
> be great if we could use the regular cxf.xml file configuration
> process[1].  Further, the requestor noted that if he put a cxf.xml
> file on his class path, a separate jetty server was spun up.  Are
> there better ways we can use CXF and its configuration process?
>   This is how we're initializing the server [2].
>
>Thank you!
>
>   Best,
>
>  Tim
>
> [0]
> https://github.com/apache/tika/blob/TIKA-3719/tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/TikaServerProcess.java#L259
>
> [1]
> https://issues.apache.org/jira/browse/TIKA-3725?focusedCommentId=17526098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17526098
>
> [2]
> https://github.com/apache/tika/blob/main/tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/TikaServerProcess.java#L234
>


Re: Code review on use of cxf in Apache Tika?

2022-04-23 Thread Andriy Redko
Hi Tim & Sergey,

Yeah, sure, happy to help here. I think I understood the problem, will try to
look shortly on how to address that in context of Tika Server (I have never
used the server-based deployment of Tika yet).

Best Regards,
Andriy Redko

SB> Hi Tim

SB> Apologies I'm totally occupied with Quarkus right now, I'm sorry it
SB> consumes all the time.
SB> Andriy, if you could help the Tika colleagues then it would be great, as
SB> you've helped with integrating Tika in Apache CXF as well, recall how we
SB> enjoyed the presentation about Tika at one of ASF Conferences :-).

SB> Cheers, Sergey

SB> On Thu, Apr 21, 2022 at 10:55 PM Tim Allison  wrote:

>> Friends and colleagues,

>>   Over on Apache Tika, our server has been using cxf for a long time.
>> We've been very happy with its capabilities and robustness.  So, thank
>> you!

>>   Recently we were asked to add TLS, and we managed to do so
>> programmatically[0]. The requestor on that issue noted that it would
>> be great if we could use the regular cxf.xml file configuration
>> process[1].  Further, the requestor noted that if he put a cxf.xml
>> file on his class path, a separate jetty server was spun up.  Are
>> there better ways we can use CXF and its configuration process?
>>   This is how we're initializing the server [2].

>>Thank you!

>>   Best,

>>  Tim

>> [0]
>> https://github.com/apache/tika/blob/TIKA-3719/tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/TikaServerProcess.java#L259

>> [1]
>> https://issues.apache.org/jira/browse/TIKA-3725?focusedCommentId=17526098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17526098

>> [2]
>> https://github.com/apache/tika/blob/main/tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/TikaServerProcess.java#L234



Re: Code review on use of cxf in Apache Tika?

2022-04-23 Thread Sergey Beryozkin
Hey Andriy

Great stuff, glad to hear, it is a collection of JAX-RS endpoints backed up
by CXF, so the team needs some help to setup HTTPS, Basic (and possibly
bearer JWT token verification going forward), I can help with clarifying
some details related to JWT, CXF has everything related to it...

Cheers, Sergey

On Sat, Apr 23, 2022 at 8:02 PM Andriy Redko  wrote:

> Hi Tim & Sergey,
>
> Yeah, sure, happy to help here. I think I understood the problem, will try
> to
> look shortly on how to address that in context of Tika Server (I have never
> used the server-based deployment of Tika yet).
>
> Best Regards,
> Andriy Redko
>
> SB> Hi Tim
>
> SB> Apologies I'm totally occupied with Quarkus right now, I'm sorry it
> SB> consumes all the time.
> SB> Andriy, if you could help the Tika colleagues then it would be great,
> as
> SB> you've helped with integrating Tika in Apache CXF as well, recall how
> we
> SB> enjoyed the presentation about Tika at one of ASF Conferences :-).
>
> SB> Cheers, Sergey
>
> SB> On Thu, Apr 21, 2022 at 10:55 PM Tim Allison 
> wrote:
>
> >> Friends and colleagues,
>
> >>   Over on Apache Tika, our server has been using cxf for a long time.
> >> We've been very happy with its capabilities and robustness.  So, thank
> >> you!
>
> >>   Recently we were asked to add TLS, and we managed to do so
> >> programmatically[0]. The requestor on that issue noted that it would
> >> be great if we could use the regular cxf.xml file configuration
> >> process[1].  Further, the requestor noted that if he put a cxf.xml
> >> file on his class path, a separate jetty server was spun up.  Are
> >> there better ways we can use CXF and its configuration process?
> >>   This is how we're initializing the server [2].
>
> >>Thank you!
>
> >>   Best,
>
> >>  Tim
>
> >> [0]
> >>
> https://github.com/apache/tika/blob/TIKA-3719/tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/TikaServerProcess.java#L259
>
> >> [1]
> >>
> https://issues.apache.org/jira/browse/TIKA-3725?focusedCommentId=17526098&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17526098
>
> >> [2]
> >>
> https://github.com/apache/tika/blob/main/tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/TikaServerProcess.java#L234
>
>


[jira] [Commented] (TIKA-1833) NoClassDefFoundError for POIXMLTypeLoader

2022-04-23 Thread YoungSwift (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526952#comment-17526952
 ] 

YoungSwift commented on TIKA-1833:
--

I can't access  !image-2022-04-24-10-01-00-160.png!

> NoClassDefFoundError for POIXMLTypeLoader
> -
>
> Key: TIKA-1833
> URL: https://issues.apache.org/jira/browse/TIKA-1833
> Project: Tika
>  Issue Type: Bug
>Reporter: M. Manna
>Priority: Major
>
> I downloaded tika-app-1.11.jar which has all the necessary dependencies 
> (checked using 7zip opener and checked the classes). I tried to parse .doc, 
> .docx files for my project, but it is throwing error (not exception). The 
> stack trace is as follows:
> java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader
> at 
> org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown
>  Source)
> at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:158)
> at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:167)
> at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:119)
> at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(XWPFWordExtractor.java:59)
> at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:204)
> at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
> at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at xxx.xxx.xxx.xxx.xAttachmentWithTika(xxxService.java:792)
> I browsed the package and couldn't find any POIXMLTypeLoader class. is this a 
> known issue? Could someone please respond to me?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (TIKA-1833) NoClassDefFoundError for POIXMLTypeLoader

2022-04-23 Thread Tilman Hausherr (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526972#comment-17526972
 ] 

Tilman Hausherr commented on TIKA-1833:
---

This is a closed issue. Please open a new one.

> NoClassDefFoundError for POIXMLTypeLoader
> -
>
> Key: TIKA-1833
> URL: https://issues.apache.org/jira/browse/TIKA-1833
> Project: Tika
>  Issue Type: Bug
>Reporter: M. Manna
>Priority: Major
>
> I downloaded tika-app-1.11.jar which has all the necessary dependencies 
> (checked using 7zip opener and checked the classes). I tried to parse .doc, 
> .docx files for my project, but it is throwing error (not exception). The 
> stack trace is as follows:
> java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader
> at 
> org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown
>  Source)
> at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:158)
> at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:167)
> at 
> org.apache.poi.xwpf.usermodel.XWPFDocument.(XWPFDocument.java:119)
> at 
> org.apache.poi.xwpf.extractor.XWPFWordExtractor.(XWPFWordExtractor.java:59)
> at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:204)
> at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
> at 
> org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at xxx.xxx.xxx.xxx.xAttachmentWithTika(xxxService.java:792)
> I browsed the package and couldn't find any POIXMLTypeLoader class. is this a 
> known issue? Could someone please respond to me?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)