[jira] [Commented] (TIKA-1335) mime type for CSV files incorrectly detected as text/plain

2014-06-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031681#comment-14031681
 ] 

Hudson commented on TIKA-1335:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #46 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/46/])
Update docs for TIKA-1335 TIKA-1336. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1602618)
* /tika/trunk/CHANGES.txt


> mime type for CSV files incorrectly detected as text/plain
> --
>
> Key: TIKA-1335
> URL: https://issues.apache.org/jira/browse/TIKA-1335
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Affects Versions: 1.5, 1.6
>Reporter: Kaijian Xu
>Assignee: Chris A. Mattmann
> Fix For: 1.6
>
> Attachments: CDEC_WEATHER_2010_03_02, foo.csv, velocity.csv
>
>
> Mime type autodetection returns "text/plain" for CSV files, for example:
> % tika -m foo.csv
> Content-Encoding: ISO-8859-1
> Content-Length: 78
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: foo.csv
> This occurs regardless of whether the filename has the appropriate *.csv 
> extension or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1336) Provide a Detector JAXRS endpoint

2014-06-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031680#comment-14031680
 ] 

Hudson commented on TIKA-1336:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #46 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/46/])
Update docs for TIKA-1335 TIKA-1336. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1602618)
* /tika/trunk/CHANGES.txt


> Provide a Detector JAXRS endpoint
> -
>
> Key: TIKA-1336
> URL: https://issues.apache.org/jira/browse/TIKA-1336
> Project: Tika
>  Issue Type: Improvement
>  Components: detector, server
>Affects Versions: 1.5
>Reporter: Nick Burch
>Assignee: Chris A. Mattmann
> Fix For: 1.6
>
>
> As identified in TIKA-1335, the Tika Server now has an endpoint which will 
> tell you what Detectors are available to it, but not one that will trigger 
> detection. That means your only way to do detection is to request the 
> metadata, and check the content type, but that isn't always as accurate as an 
> explicit detection call (eg if a general parser picks up the file)
> We should therefore add in a new endpoint that just does the detection



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1336) Provide a Detector JAXRS endpoint

2014-06-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031673#comment-14031673
 ] 

Hudson commented on TIKA-1336:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #46 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/46/])
Update docs for TIKA-1335 TIKA-1336. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1602618)
* /tika/trunk/CHANGES.txt
- fix for TIKA-1336 This closes #10 (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1602617)
* /tika/trunk/tika-server/README
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/DetectorResource.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/DetectorResourceTest.java
* /tika/trunk/tika-server/src/test/resources/CDEC_WEATHER_2010_03_02
* /tika/trunk/tika-server/src/test/resources/foo.csv


> Provide a Detector JAXRS endpoint
> -
>
> Key: TIKA-1336
> URL: https://issues.apache.org/jira/browse/TIKA-1336
> Project: Tika
>  Issue Type: Improvement
>  Components: detector, server
>Affects Versions: 1.5
>Reporter: Nick Burch
>Assignee: Chris A. Mattmann
> Fix For: 1.6
>
>
> As identified in TIKA-1335, the Tika Server now has an endpoint which will 
> tell you what Detectors are available to it, but not one that will trigger 
> detection. That means your only way to do detection is to request the 
> metadata, and check the content type, but that isn't always as accurate as an 
> explicit detection call (eg if a general parser picks up the file)
> We should therefore add in a new endpoint that just does the detection



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1335) mime type for CSV files incorrectly detected as text/plain

2014-06-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031674#comment-14031674
 ] 

Hudson commented on TIKA-1335:
--

SUCCESS: Integrated in tika-trunk-jdk1.7 #46 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/46/])
Update docs for TIKA-1335 TIKA-1336. (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1602618)
* /tika/trunk/CHANGES.txt


> mime type for CSV files incorrectly detected as text/plain
> --
>
> Key: TIKA-1335
> URL: https://issues.apache.org/jira/browse/TIKA-1335
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Affects Versions: 1.5, 1.6
>Reporter: Kaijian Xu
>Assignee: Chris A. Mattmann
> Fix For: 1.6
>
> Attachments: CDEC_WEATHER_2010_03_02, foo.csv, velocity.csv
>
>
> Mime type autodetection returns "text/plain" for CSV files, for example:
> % tika -m foo.csv
> Content-Encoding: ISO-8859-1
> Content-Length: 78
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: foo.csv
> This occurs regardless of whether the filename has the appropriate *.csv 
> extension or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1336) Provide a Detector JAXRS endpoint

2014-06-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031662#comment-14031662
 ] 

Hudson commented on TIKA-1336:
--

SUCCESS: Integrated in tika-trunk-jdk1.6 #45 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/45/])
- fix for TIKA-1336 This closes #10 (mattmann: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1602617)
* /tika/trunk/tika-server/README
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/DetectorResource.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaServerCli.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/DetectorResourceTest.java
* /tika/trunk/tika-server/src/test/resources/CDEC_WEATHER_2010_03_02
* /tika/trunk/tika-server/src/test/resources/foo.csv


> Provide a Detector JAXRS endpoint
> -
>
> Key: TIKA-1336
> URL: https://issues.apache.org/jira/browse/TIKA-1336
> Project: Tika
>  Issue Type: Improvement
>  Components: detector, server
>Affects Versions: 1.5
>Reporter: Nick Burch
>Assignee: Chris A. Mattmann
> Fix For: 1.6
>
>
> As identified in TIKA-1335, the Tika Server now has an endpoint which will 
> tell you what Detectors are available to it, but not one that will trigger 
> detection. That means your only way to do detection is to request the 
> metadata, and check the content type, but that isn't always as accurate as an 
> explicit detection call (eg if a general parser picks up the file)
> We should therefore add in a new endpoint that just does the detection



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1336) Provide a Detector JAXRS endpoint

2014-06-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031661#comment-14031661
 ] 

ASF GitHub Bot commented on TIKA-1336:
--

Github user asfgit closed the pull request at:

https://github.com/apache/tika/pull/10


> Provide a Detector JAXRS endpoint
> -
>
> Key: TIKA-1336
> URL: https://issues.apache.org/jira/browse/TIKA-1336
> Project: Tika
>  Issue Type: Improvement
>  Components: detector, server
>Affects Versions: 1.5
>Reporter: Nick Burch
>Assignee: Chris A. Mattmann
> Fix For: 1.6
>
>
> As identified in TIKA-1335, the Tika Server now has an endpoint which will 
> tell you what Detectors are available to it, but not one that will trigger 
> detection. That means your only way to do detection is to request the 
> metadata, and check the content type, but that isn't always as accurate as an 
> explicit detection call (eg if a general parser picks up the file)
> We should therefore add in a new endpoint that just does the detection



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[GitHub] tika pull request: Fix for TIKA-1336: initial working detect strea...

2014-06-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/tika/pull/10


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (TIKA-1335) mime type for CSV files incorrectly detected as text/plain

2014-06-14 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann resolved TIKA-1335.
-

   Resolution: Fixed
Fix Version/s: 1.6

- per issue comments

> mime type for CSV files incorrectly detected as text/plain
> --
>
> Key: TIKA-1335
> URL: https://issues.apache.org/jira/browse/TIKA-1335
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Affects Versions: 1.5, 1.6
>Reporter: Kaijian Xu
>Assignee: Chris A. Mattmann
> Fix For: 1.6
>
> Attachments: CDEC_WEATHER_2010_03_02, foo.csv, velocity.csv
>
>
> Mime type autodetection returns "text/plain" for CSV files, for example:
> % tika -m foo.csv
> Content-Encoding: ISO-8859-1
> Content-Length: 78
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: foo.csv
> This occurs regardless of whether the filename has the appropriate *.csv 
> extension or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (TIKA-1336) Provide a Detector JAXRS endpoint

2014-06-14 Thread Chris A. Mattmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann resolved TIKA-1336.
-

   Resolution: Fixed
Fix Version/s: 1.6

Committed an initial version of the JAX-RS detect interface in r1602617 from 
GitHub PR #10. Also updated docs on wiki and in README file. You have to give 
it a hint on CSVs by providing the filename in the Content-Disposition header, 
but maybe later we can improve our media type detection more on CSV files to 
differentiate them from text/plain. Thanks to [~gagravarr] and [~kxu] for 
motivation in getting this done.

> Provide a Detector JAXRS endpoint
> -
>
> Key: TIKA-1336
> URL: https://issues.apache.org/jira/browse/TIKA-1336
> Project: Tika
>  Issue Type: Improvement
>  Components: detector, server
>Affects Versions: 1.5
>Reporter: Nick Burch
>Assignee: Chris A. Mattmann
> Fix For: 1.6
>
>
> As identified in TIKA-1335, the Tika Server now has an endpoint which will 
> tell you what Detectors are available to it, but not one that will trigger 
> detection. That means your only way to do detection is to request the 
> metadata, and check the content type, but that isn't always as accurate as an 
> explicit detection call (eg if a general parser picks up the file)
> We should therefore add in a new endpoint that just does the detection



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1335) mime type for CSV files incorrectly detected as text/plain

2014-06-14 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031653#comment-14031653
 ] 

Chris A. Mattmann commented on TIKA-1335:
-

[~kxu] see the update I committed in TIKA-1336 - and the docs on the wiki for 
JaxRS: https://wiki.apache.org/tika/TikaJAXRS I think this should take care of 
your detections for now, so long as you provide the filename hint or trick in 
the Content-Disposition header. I'm marking this as resolved for now, please 
open up a new more specific issue if this doesn't deal with your fix.

> mime type for CSV files incorrectly detected as text/plain
> --
>
> Key: TIKA-1335
> URL: https://issues.apache.org/jira/browse/TIKA-1335
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Affects Versions: 1.5, 1.6
>Reporter: Kaijian Xu
>Assignee: Chris A. Mattmann
> Attachments: CDEC_WEATHER_2010_03_02, foo.csv, velocity.csv
>
>
> Mime type autodetection returns "text/plain" for CSV files, for example:
> % tika -m foo.csv
> Content-Encoding: ISO-8859-1
> Content-Length: 78
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: foo.csv
> This occurs regardless of whether the filename has the appropriate *.csv 
> extension or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1336) Provide a Detector JAXRS endpoint

2014-06-14 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031649#comment-14031649
 ] 

Chris A. Mattmann commented on TIKA-1336:
-

Docs updated for the resource in: https://wiki.apache.org/tika/TikaJAXRS

> Provide a Detector JAXRS endpoint
> -
>
> Key: TIKA-1336
> URL: https://issues.apache.org/jira/browse/TIKA-1336
> Project: Tika
>  Issue Type: Improvement
>  Components: detector, server
>Affects Versions: 1.5
>Reporter: Nick Burch
>Assignee: Chris A. Mattmann
>
> As identified in TIKA-1335, the Tika Server now has an endpoint which will 
> tell you what Detectors are available to it, but not one that will trigger 
> detection. That means your only way to do detection is to request the 
> metadata, and check the content type, but that isn't always as accurate as an 
> explicit detection call (eg if a general parser picks up the file)
> We should therefore add in a new endpoint that just does the detection



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1336) Provide a Detector JAXRS endpoint

2014-06-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031647#comment-14031647
 ] 

ASF GitHub Bot commented on TIKA-1336:
--

GitHub user chrismattmann opened a pull request:

https://github.com/apache/tika/pull/10

Fix for TIKA-1336: initial working detect stream interface, along with u...

...nit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chrismattmann/tika TIKA-1336

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tika/pull/10.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10


commit b315a1797f2ffb0f83abdcff051d110facf2a128
Author: Chris Mattmann 
Date:   2014-06-14T18:25:21Z

Fix for TIKA-1336: initial working detect stream interface, along with unit 
tests.




> Provide a Detector JAXRS endpoint
> -
>
> Key: TIKA-1336
> URL: https://issues.apache.org/jira/browse/TIKA-1336
> Project: Tika
>  Issue Type: Improvement
>  Components: detector, server
>Affects Versions: 1.5
>Reporter: Nick Burch
>Assignee: Chris A. Mattmann
>
> As identified in TIKA-1335, the Tika Server now has an endpoint which will 
> tell you what Detectors are available to it, but not one that will trigger 
> detection. That means your only way to do detection is to request the 
> metadata, and check the content type, but that isn't always as accurate as an 
> explicit detection call (eg if a general parser picks up the file)
> We should therefore add in a new endpoint that just does the detection



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[GitHub] tika pull request: Fix for TIKA-1336: initial working detect strea...

2014-06-14 Thread chrismattmann
GitHub user chrismattmann opened a pull request:

https://github.com/apache/tika/pull/10

Fix for TIKA-1336: initial working detect stream interface, along with u...

...nit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chrismattmann/tika TIKA-1336

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tika/pull/10.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10


commit b315a1797f2ffb0f83abdcff051d110facf2a128
Author: Chris Mattmann 
Date:   2014-06-14T18:25:21Z

Fix for TIKA-1336: initial working detect stream interface, along with unit 
tests.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (TIKA-1335) mime type for CSV files incorrectly detected as text/plain

2014-06-14 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031614#comment-14031614
 ] 

Chris A. Mattmann commented on TIKA-1335:
-

well, I'll try and make some progress, either way.

> mime type for CSV files incorrectly detected as text/plain
> --
>
> Key: TIKA-1335
> URL: https://issues.apache.org/jira/browse/TIKA-1335
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Affects Versions: 1.5, 1.6
>Reporter: Kaijian Xu
>Assignee: Chris A. Mattmann
> Attachments: CDEC_WEATHER_2010_03_02, foo.csv, velocity.csv
>
>
> Mime type autodetection returns "text/plain" for CSV files, for example:
> % tika -m foo.csv
> Content-Encoding: ISO-8859-1
> Content-Length: 78
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: foo.csv
> This occurs regardless of whether the filename has the appropriate *.csv 
> extension or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1335) mime type for CSV files incorrectly detected as text/plain

2014-06-14 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031492#comment-14031492
 ] 

Nick Burch commented on TIKA-1335:
--

I don't think we're going to be able to write mime matchers that reliably 
detect CSV, not least because there's so many variants of it (tab? comma? 
quoted? double quoted? escaped?)

> mime type for CSV files incorrectly detected as text/plain
> --
>
> Key: TIKA-1335
> URL: https://issues.apache.org/jira/browse/TIKA-1335
> Project: Tika
>  Issue Type: Bug
>  Components: mime
>Affects Versions: 1.5, 1.6
>Reporter: Kaijian Xu
>Assignee: Chris A. Mattmann
> Attachments: CDEC_WEATHER_2010_03_02, foo.csv, velocity.csv
>
>
> Mime type autodetection returns "text/plain" for CSV files, for example:
> % tika -m foo.csv
> Content-Encoding: ISO-8859-1
> Content-Length: 78
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: foo.csv
> This occurs regardless of whether the filename has the appropriate *.csv 
> extension or not.



--
This message was sent by Atlassian JIRA
(v6.2#6252)