If you use the new apache tika functions in GSearch 2.4.1, you may specify a limit (or just use the default of 100000 characters), then the remaining part of the pdf file will be ignored. This will guard you against very large pdf files. If you use the original indexing functions in GSearch there is no limit, except your system ressources. Both set of functions call PDFBox 1.6.0.
Gert On 01/05/2012, at 20.59, Chalk, Stuart wrote: > Can anyone tell me the file size limitations on PDF files indexed by FGS > (using lucene)? Also, what version of PDF does it handle? > > Stuart Chalk, Ph.D. > Associate Professor of Chemistry > Department of Chemistry, Building 50, Room 3514, > University of North Florida > 1 UNF Drive, Jacksonville, FL 32224 USA > P: 904-620-1938 > F: 904-620-3535 > E: [email protected] > W: http://www.unf.edu/coas/chemistry/ > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Fedora-commons-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Fedora-commons-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
