I've tested the lastest patch. It works fine but I noticed about a 50% slow down with pdfs containing a lot of JBIG2 images.
I've uploaded a couple of sample pdfs here: http://people.freedesktop.org/~ajohnson/jbig2/ Below are the timings for each document using git master and git master with the jbig2 patch. I used the command line pdftocairo -pdf input.pdf output.pdf - AnnualReport.pdf master real 0m13.562s user 0m12.620s sys 0m0.820s jbig2 real 0m21.045s user 0m20.000s sys 0m1.060s - The_Miseries_of_Human_Life.pdf master real 0m32.690s user 0m29.716s sys 0m2.832s jbig2 real 0m42.793s user 0m39.680s sys 0m2.848s While it is still worth a small performance reduction to get the significant reduction in the output file size, it may be worth seeing if there is a simple solution to the slowdown. My first guess is to try attaching the global data to only one image instead of to every image. On 20/01/15 02:42, suzuki toshiya wrote: > Dear Carlos, > > Sorry! I had a silly mistake, I should pass ref->getRef() > (in my previous patch, I passed ref->isRef()). > I attach corrected patch. > > Regards, > mpsuzuki > > suzuki toshiya wrote: >> Dear Carlos, >> >> Thank you for finding out the points! I attached revised patch. >> The difference from my previous revision is following. >> >> Regards, >> mpsuzuki >> >> >> diff --git a/poppler/CairoOutputDev.cc b/poppler/CairoOutputDev.cc >> index a428d2c..17b6b3b 100644 >> --- a/poppler/CairoOutputDev.cc >> +++ b/poppler/CairoOutputDev.cc >> @@ -2833,13 +2833,12 @@ void CairoOutputDev::setMimeData(GfxState *state, >> Stream *str, Object *ref, >> #endif >> >> if (getStreamData (str->getNextStream(), &strBuffer, &len)) { >> - cairo_status_t status; >> + cairo_status_t status = CAIRO_STATUS_SUCCESS; >> >> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2) >> if (ref && ref->isRef()) { >> - Ref imgRef = ref->getRef(); >> status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID, >> - "poppler-surface-", imgRef); >> + "poppler-surface-", ref->isRef()); >> } >> #endif >> if (!status) { >> >> >> >> Carlos Garcia Campos wrote: >>> Adrian Johnson <[email protected]> writes: >>> >>>> The latest patch looks good. I'll wait a week before committing to see >>>> if anyone else has any comments. This will give me time to do some >>>> testing of the patch. >>> Looks good to me as well in general. There's only one thing I've >>> noticed. >>> >>>> @@ -2746,31 +2827,28 @@ void CairoOutputDev::setMimeData(GfxState *state, >>>> Stream *str, Object *ref, >>>> if (!colorMapHasIdentityDecodeMap(colorMap)) >>>> return; >>>> >>>> +#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0) >>>> + if (strKind == strJBIG2 && !setMimeDataForJBIG2Globals(str, image)) >>>> + return; >>>> +#endif >>>> + >>>> if (getStreamData (str->getNextStream(), &strBuffer, &len)) { >>>> - cairo_status_t st; >>>> + cairo_status_t status; >>>> >>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2) >>>> if (ref && ref->isRef()) { >>>> Ref imgRef = ref->getRef(); >>>> - GooString *surfaceId = new GooString("poppler-surface-"); >>>> - surfaceId->appendf("{0:d}-{1:d}", imgRef.gen, imgRef.num); >>>> - char *idBuffer = copyString(surfaceId->getCString()); >>>> - st = cairo_surface_set_mime_data (image, CAIRO_MIME_TYPE_UNIQUE_ID, >>>> - (const unsigned char *)idBuffer, >>>> - surfaceId->getLength(), >>>> - gfree, idBuffer); >>>> - if (st) >>>> - gfree(idBuffer); >>>> - delete surfaceId; >>>> + status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID, >>>> + "poppler-surface-", imgRef); >>> We could pass ref->getRef() directly here. >>> >>>> } >>>> #endif >>>> + if (!status) { >>> status might be uninitialized at this point if cairo version is not >>> recent enough or the ref && ref->isRef() condition is False. >>> >>>> + status = cairo_surface_set_mime_data (image, mime_type, >>>> + (const unsigned char *)strBuffer, >>>> len, >>>> + gfree, strBuffer); >>>> + } >>> >>>> - st = cairo_surface_set_mime_data (image, >>>> - str->getKind() == strDCT ? >>>> - CAIRO_MIME_TYPE_JPEG : >>>> CAIRO_MIME_TYPE_JP2, >>>> - (const unsigned char *)strBuffer, len, >>>> - gfree, strBuffer); >>>> - if (st) >>>> + if (status) >>>> gfree (strBuffer); >>>> } >>>> } >>> >>>> On 11/01/15 23:05, suzuki toshiya wrote: >>>>> Dear Adrian, >>>>> >>>>> OK - considering that there is a lag to the day that Linux >>>>> distribution ship the poppler stable branch including >>>>> CairoOutputDev with JBIG2 support even if I support some >>>>> snapshot of Cairo in current development branch of poppler >>>>> (and there would be no need to support Cairo snapshot in >>>>> the day), I decided to follow original version checking. >>>>> >>>>> Also I removed the default value (NULL) for Ref object to >>>>> JBIG2Stream constructor and removed NULL pointer checking >>>>> in JBIG2Stream constructor (as same as the argument to pass >>>>> Globals object). >>>>> >>>>> I replaced "st" in CairoOutputDev, by "status". >>>>> >>>>> Here I attached revised patch - oh, I'm quite sorry, my >>>>> previous posts were misunderstanding this year as 2014. >>>>> >>>>> Regards, >>>>> mpsuzuki >>>>> >>>>> Adrian Johnson wrote: >>>>>> On 08/01/15 13:35, suzuki toshiya wrote: >>>>>>> Dear Adrian, >>>>>>> >>>>>>> Great Thank you for prompt review! I attached the revised patch, >>>>>>> but please let me discuss a few points. >>>>>>> >>>>>>> Adrian Johnson wrote: >>>>>>>> On 07/01/15 02:58, suzuki toshiya wrote: >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> Here I submit a preliminary patch for git head, >>>>>>>>> which enables JBIG2 embedding to PDF surface via Cairo. >>>>>>>> +static cairo_status_t setJBIG2GlobalsIdByRefNums(int refNum, int >>>>>>>> refGen, >>>>>>>> + cairo_surface_t >>>>>>>> *image) >>>>>>>> +{ >>>>>>>> + GooString *globalsStrId; >>>>>>>> + char *idBuffer; >>>>>>>> + cairo_status_t st; >>>>>>>> + >>>>>>>> + >>>>>>>> + globalsStrId = new GooString; >>>>>>>> + globalsStrId->appendf("{0:d}-{1:d}", refNum, refGen); >>>>>>>> + idBuffer = copyString(globalsStrId->getCString()); >>>>>>>> + st = cairo_surface_set_mime_data (image, >>>>>>>> CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID, >>>>>>>> >>>>>>>> The cairo JBIG2 mime types are new to 1.14.0 so will need to be >>>>>>>> inside a >>>>>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0). >>>>>>> If it's acceptable, I want to use the conditional like >>>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID >>>>>>> The official release of cairo including JBIG2 support is 1.14.0, >>>>>>> however, some distributions use a snapshot before 1.14 which >>>>>>> supports JBIG2. For example, Ubuntu 14.04 (long time support) >>>>>>> ships with 1.13.0~20140204. >>>>>>> >>>>>>> http://packages.ubuntu.com/search?suite=trusty§ion=all&arch=any&keywords=libcairo2&searchon=names >>>>>>> >>>>>>> >>>>>>> It includes JBIG2 support. Thus I prefer checking by the JBIG2 >>>>>>> macros, instead of the version checking. >>>>>> I'm not keen on making an exception for this case. The target audience >>>>>> for this is very small. It would only help users that compile a new >>>>>> version of poppler on an old distribution without also recompiling >>>>>> cairo. JBIG2 support for pdftocairo is a very minor feature that not >>>>>> many people would be using (so far no one else has requested this >>>>>> feature). >>>>>> >>>>>> If you really want to support distributions using a cairo 1.13 snapshot >>>>>> I suggest adding a comment before the first occurrence of >>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID to explain why the version macro >>>>>> is not used. Also include >>>>>> "#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)" in the comment so >>>>>> that: >>>>>> a) searching for CAIRO_VERSION will find it, and >>>>>> b) to document that JBIG2 support is in 1.14 or later. >>>>>> >>>>>>> BTW, when the string for UNIQUE_ID is constructed, its syntax >>>>>>> is RefGen-RefNum. In PDF data, RefNum then RefGen is popular >>>>>>> syntax. Is there any reason why the order is reverted? >>>>>> No reason. Feel free to change it to num-gen order. >>>>>> >>>>>>>> +GBool CairoOutputDev::setMimeDataForJBIG2Globals(Stream *str, >>>>>>>> + cairo_surface_t >>>>>>>> *image) >>>>>>>> +{ >>>>>>>> + JBIG2Stream *jbig2Str = static_cast<JBIG2Stream *>(str); >>>>>>>> + Object* globalsStr = jbig2Str->getGlobalsStream(); >>>>>>>> + char *globalsBuffer; >>>>>>>> + int globalsLength; >>>>>>>> + >>>>>>>> + // According to PDF spec, Globals should be stream (or nothing) >>>>>>>> + >>>>>>>> + if (globalsStr->isNull()) >>>>>>>> + return gTrue; >>>>>>>> + >>>>>>>> + if (!globalsStr->isStream()) >>>>>>>> + return gFalse; >>>>>>>> + >>>>>>>> + // See JBIG2Stream constructor for valid reference number >>>>>>>> + if (!jbig2Str->getGlobalsStreamRefNum()) >>>>>>>> + return gFalse; >>>>>>>> >>>>>>>> The globalsStr->isStream() check should be sufficient. The reference >>>>>>>> number should always be valid of the global stream is valid. >>>>>>> Umm. Requesting "both of JBIG2Stream::globalsStream and Ref to >>>>>>> it must be always valid as far as GlobalsStream is used" is >>>>>>> intuitive solution, but public JBIG2Stream constructor has not >>>>>>> requested the Ref to GlobalsStream before in earlier version. >>>>>>> >>>>>>> If some developers use JBIG2Stream constructor with valid >>>>>>> GlobalsStream but without Ref to it (as the syntax before my >>>>>>> patch), there is the situation that JBIG2Stream::globalsStream >>>>>>> is valid but Ref to it is invalid (if I supply the default >>>>>>> value for the argument to pass Ref). How do we detect or prevent >>>>>>> such case? >>>>>>> >>>>>>> A) Supplying the default value (NULL pointer) in the argument >>>>>>> to pass the Ref, and JBIG2Stream returns an error when >>>>>>> globalsStream is valid stream but its Ref is not passed - and >>>>>>> prevent the case that globalsStream is valid but its Ref is >>>>>>> invalid. >>>>>>> >>>>>>> B) Not supplying the default value, and prevent the construction >>>>>>> without the Ref to the globalsStream. >>>>>>> >>>>>>> C) Add new boolean value in JBIG2Stream class, to indicate if >>>>>>> Ref to globalsStream is explicitly set. >>>>>>> >>>>>>> D) Initialize Ref with irregular value (e.g. 0,0 or 0,65535) >>>>>>> internally when Ref is not passed, and deal the case as Ref to >>>>>>> the globalsStream is not initialized (my last patch design) >>>>>>> >>>>>>> Which is the best? Or, no need to case such case? >>>>>> JBIG2Stream is not part of the public API so there is no need to >>>>>> preserve compatibility. >>>>>> >>>>>> Something else I just noticed - could you rename "st" to "status". >>>>>> CairoOutputdev.cc always uses the name "status" for cairo status so we >>>>>> should be consistent with this convention. >>>>>> >>>> _______________________________________________ >>>> poppler mailing list >>>> [email protected] >>>> http://lists.freedesktop.org/mailman/listinfo/poppler >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> poppler mailing list >> [email protected] >> http://lists.freedesktop.org/mailman/listinfo/poppler > > > > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
