[jira] [Created] (TIKA-3550) Some DXF files are detected as text/plain

2021-09-12 Thread Robin Schimpf (Jira)
Robin Schimpf created TIKA-3550:
---

 Summary: Some DXF files are detected as text/plain
 Key: TIKA-3550
 URL: https://issues.apache.org/jira/browse/TIKA-3550
 Project: Tika
  Issue Type: Bug
Affects Versions: 2.1.0, 1.27
Reporter: Robin Schimpf


I noticed Tika fails to detect the fileformat of the files from 
[https://people.math.sc.edu/Burkardt/data/dxf/dxf.html]

Contrary to the testfile included (where the test is currently disabled on 2.x) 
those files have 2 spaces before the numbers. The comment in the 
tika-mimetypes.xml suggests for me that this should work. Would be nice if the 
detection would work from no space to any number of spaces before the number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-3550) Some DXF files are detected as text/plain

2021-09-12 Thread Robin Schimpf (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robin Schimpf updated TIKA-3550:

Attachment: Cube FreeCAD.dxf

> Some DXF files are detected as text/plain
> -
>
> Key: TIKA-3550
> URL: https://issues.apache.org/jira/browse/TIKA-3550
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.27, 2.1.0
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: Cube FreeCAD.dxf
>
>
> I noticed Tika fails to detect the fileformat of the files from 
> [https://people.math.sc.edu/Burkardt/data/dxf/dxf.html]
> Contrary to the testfile included (where the test is currently disabled on 
> 2.x) those files have 2 spaces before the numbers. The comment in the 
> tika-mimetypes.xml suggests for me that this should work. Would be nice if 
> the detection would work from no space to any number of spaces before the 
> number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3550) Some DXF files are detected as text/plain

2021-09-12 Thread Robin Schimpf (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413764#comment-17413764
 ] 

Robin Schimpf commented on TIKA-3550:
-

Found that FreeCAD is able to export DXF files. Created a simple cube and 
exportet it. This files differs from the linked files as the software used to 
create the file is the first entry there.

> Some DXF files are detected as text/plain
> -
>
> Key: TIKA-3550
> URL: https://issues.apache.org/jira/browse/TIKA-3550
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.27, 2.1.0
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: Cube FreeCAD.dxf
>
>
> I noticed Tika fails to detect the fileformat of the files from 
> [https://people.math.sc.edu/Burkardt/data/dxf/dxf.html]
> Contrary to the testfile included (where the test is currently disabled on 
> 2.x) those files have 2 spaces before the numbers. The comment in the 
> tika-mimetypes.xml suggests for me that this should work. Would be nice if 
> the detection would work from no space to any number of spaces before the 
> number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (TIKA-3550) Some DXF files are detected as text/plain

2021-09-12 Thread Robin Schimpf (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413764#comment-17413764
 ] 

Robin Schimpf edited comment on TIKA-3550 at 9/12/21, 7:32 PM:
---

Found that FreeCAD is able to export DXF files. Created a simple cube and 
exported it. This files differs from the linked files as the software used to 
create the file is the first entry there.


was (Author: rschimpf):
Found that FreeCAD is able to export DXF files. Created a simple cube and 
exportet it. This files differs from the linked files as the software used to 
create the file is the first entry there.

> Some DXF files are detected as text/plain
> -
>
> Key: TIKA-3550
> URL: https://issues.apache.org/jira/browse/TIKA-3550
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.27, 2.1.0
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: Cube FreeCAD.dxf
>
>
> I noticed Tika fails to detect the fileformat of the files from 
> [https://people.math.sc.edu/Burkardt/data/dxf/dxf.html]
> Contrary to the testfile included (where the test is currently disabled on 
> 2.x) those files have 2 spaces before the numbers. The comment in the 
> tika-mimetypes.xml suggests for me that this should work. Would be nice if 
> the detection would work from no space to any number of spaces before the 
> number.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-3546) Side effects of setting WriteOutContentHandler write limit as -1 are unknown

2021-09-12 Thread Yash Mehta (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413922#comment-17413922
 ] 

Yash Mehta commented on TIKA-3546:
--

How to send a mail to this list? Can you help me out? I am not able to figure 
it out.

> Side effects of setting WriteOutContentHandler write limit as -1 are unknown
> 
>
> Key: TIKA-3546
> URL: https://issues.apache.org/jira/browse/TIKA-3546
> Project: Tika
>  Issue Type: Improvement
>  Components: core
>Reporter: Yash Mehta
>Priority: Minor
>
> WriteOutContentHandler has a parameterized constructor which can be used to 
> specify the writeLimit. The default seems to be 100,000 characters. Setting 
> this to -1 signifies no writing limit. I want to understand the side effects 
> of keeping it as -1, whether it can cause any memory or performance issues.
>  
> I also want to understand the reason why this write limit was introduced in 
> the first place, any use-case or user scenario due to which this parameter 
> was introduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)