[jira] [Commented] (TIKA-1195) XLSB support

2017-04-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974972#comment-15974972
 ] 

Hudson commented on TIKA-1195:
--

SUCCESS: Integrated in Jenkins build tika-2.x-windows #198 (See 
[https://builds.apache.org/job/tika-2.x-windows/198/])
TIKA-1195 and TIKA-2329, upgrade to POI 3.16-final and add xlsb parser 
(tallison: rev a847a863d1e25a9ba8209cd28c3e98be153f34a5)
* (edit) 
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParser.java
* (edit) 
tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
* (edit) 
tika-parser-modules/tika-parser-office-module/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
* (edit) 
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java
* (edit) CHANGES.txt
* (edit) 
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
* (edit) tika-parser-modules/pom.xml
* (add) 
tika-test-resources/src/test/resources/test-documents/testEXCEL_various.xlsb
* (add) 
tika-parser-modules/tika-parser-office-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFBExcelExtractorDecorator.java


> XLSB support
> 
>
> Key: TIKA-1195
> URL: https://issues.apache.org/jira/browse/TIKA-1195
> Project: Tika
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.4
> Environment: W2008R2
>Reporter: Frederic Ronny
>  Labels: new-parser
> Fix For: 2.0, 1.15
>
>
> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive, works 
> fine for most of our Office filetypes ( docx, xlsx, ) but we also have a 
> lot of files with filetype xlsb which are not in the supported filetypes. 
> In order to keep using this solution it is essential to us that there will be 
> a solution provided in the future



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-1195) XLSB support

2017-04-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974823#comment-15974823
 ] 

Hudson commented on TIKA-1195:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1240 (See 
[https://builds.apache.org/job/Tika-trunk/1240/])
TIKA-1195 and TIKA-2329 (tallison: 
[https://github.com/apache/tika/commit/67612b8f805ad5d1085db14922d3b3b6ddce19bf])
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParser.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ExcelParserTest.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
* (edit) CHANGES.txt
* (edit) tika-parsers/pom.xml
* (add) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFBExcelExtractorDecorator.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java


> XLSB support
> 
>
> Key: TIKA-1195
> URL: https://issues.apache.org/jira/browse/TIKA-1195
> Project: Tika
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.4
> Environment: W2008R2
>Reporter: Frederic Ronny
>  Labels: new-parser
>
> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive, works 
> fine for most of our Office filetypes ( docx, xlsx, ) but we also have a 
> lot of files with filetype xlsb which are not in the supported filetypes. 
> In order to keep using this solution it is essential to us that there will be 
> a solution provided in the future



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-1195) XLSB support

2017-03-17 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929700#comment-15929700
 ] 

Tim Allison commented on TIKA-1195:
---

I think so, but It Depends (TM).  I figure the next version of POI will be out 
in early/mid April, and then we could shoot for 1.15.  All depends on our 
communities of devs, though.

Next version of PDFBox should be out within the next few days.

> XLSB support
> 
>
> Key: TIKA-1195
> URL: https://issues.apache.org/jira/browse/TIKA-1195
> Project: Tika
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.4
> Environment: W2008R2
>Reporter: Frederic Ronny
>  Labels: new-parser
>
> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive, works 
> fine for most of our Office filetypes ( docx, xlsx, ) but we also have a 
> lot of files with filetype xlsb which are not in the supported filetypes. 
> In order to keep using this solution it is essential to us that there will be 
> a solution provided in the future



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-1195) XLSB support

2017-03-16 Thread Matthew Caruana Galizia (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929105#comment-15929105
 ] 

Matthew Caruana Galizia commented on TIKA-1195:
---

[~talli...@mitre.org] d'you reckon that will be out with Tika 1.15?

> XLSB support
> 
>
> Key: TIKA-1195
> URL: https://issues.apache.org/jira/browse/TIKA-1195
> Project: Tika
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.4
> Environment: W2008R2
>Reporter: Frederic Ronny
>  Labels: new-parser
>
> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive, works 
> fine for most of our Office filetypes ( docx, xlsx, ) but we also have a 
> lot of files with filetype xlsb which are not in the supported filetypes. 
> In order to keep using this solution it is essential to us that there will be 
> a solution provided in the future



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-1195) XLSB support

2017-03-16 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928636#comment-15928636
 ] 

Tim Allison commented on TIKA-1195:
---

Basic xlsb streaming/read-only support was just added and should be available 
with 3.15-beta3.

> XLSB support
> 
>
> Key: TIKA-1195
> URL: https://issues.apache.org/jira/browse/TIKA-1195
> Project: Tika
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.4
> Environment: W2008R2
>Reporter: Frederic Ronny
>  Labels: new-parser
>
> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive, works 
> fine for most of our Office filetypes ( docx, xlsx, ) but we also have a 
> lot of files with filetype xlsb which are not in the supported filetypes. 
> In order to keep using this solution it is essential to us that there will be 
> a solution provided in the future



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-1195) XLSB support

2017-03-06 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898084#comment-15898084
 ] 

Tim Allison commented on TIKA-1195:
---

https://bz.apache.org/bugzilla/show_bug.cgi?id=60826

> XLSB support
> 
>
> Key: TIKA-1195
> URL: https://issues.apache.org/jira/browse/TIKA-1195
> Project: Tika
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.4
> Environment: W2008R2
>Reporter: Frederic Ronny
>  Labels: new-parser
>
> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive, works 
> fine for most of our Office filetypes ( docx, xlsx, ) but we also have a 
> lot of files with filetype xlsb which are not in the supported filetypes. 
> In order to keep using this solution it is essential to us that there will be 
> a solution provided in the future



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-1195) XLSB support

2017-03-06 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897353#comment-15897353
 ] 

Tim Allison commented on TIKA-1195:
---

>From [~mcaruanagalizia] via twitter, an ASL 2.0 licensed javascript xlsb 
>parser: https://github.com/SheetJS/js-xlsx

> XLSB support
> 
>
> Key: TIKA-1195
> URL: https://issues.apache.org/jira/browse/TIKA-1195
> Project: Tika
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.4
> Environment: W2008R2
>Reporter: Frederic Ronny
>  Labels: new-parser
>
> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive, works 
> fine for most of our Office filetypes ( docx, xlsx, ) but we also have a 
> lot of files with filetype xlsb which are not in the supported filetypes. 
> In order to keep using this solution it is essential to us that there will be 
> a solution provided in the future



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (TIKA-1195) XLSB support

2016-01-05 Thread Dominik Stadler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083679#comment-15083679
 ] 

Dominik Stadler commented on TIKA-1195:
---

Some official description is at 
https://msdn.microsoft.com/en-us/library/office/cc313133(v=office.12).aspx

> XLSB support
> 
>
> Key: TIKA-1195
> URL: https://issues.apache.org/jira/browse/TIKA-1195
> Project: Tika
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 1.4
> Environment: W2008R2
>Reporter: Frederic Ronny
>  Labels: new-parser
>
> We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive, works 
> fine for most of our Office filetypes ( docx, xlsx, ) but we also have a 
> lot of files with filetype xlsb which are not in the supported filetypes. 
> In order to keep using this solution it is essential to us that there will be 
> a solution provided in the future



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1195) XLSB support

2015-03-16 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364023#comment-14364023
 ] 

Nick Burch commented on TIKA-1195:
--

No POI support as yet - will take a non-trivial amount of work (single digit 
days at minimum, maybe just into double digit days) to support it properly, and 
I'd guess at least a day for a hacky solution. Thus far, no takers to 
sponsor/do the work involved

 XLSB support
 

 Key: TIKA-1195
 URL: https://issues.apache.org/jira/browse/TIKA-1195
 Project: Tika
  Issue Type: Improvement
  Components: general
Affects Versions: 1.4
 Environment: W2008R2
Reporter: Frederic Ronny
  Labels: new-parser

 We use Manifoldcf 1.3 and Solr 4.4 to index a shared network drive, works 
 fine for most of our Office filetypes ( docx, xlsx, ) but we also have a 
 lot of files with filetype xlsb which are not in the supported filetypes. 
 In order to keep using this solution it is essential to us that there will be 
 a solution provided in the future



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)