Hi
Does the PdfContentsTokenizer read one line at a time. I'm just trying to
understand how it picks up it's data (into the PdfVariant :out Parameter)
The reason I ask is the line of code below.. (From TextExtractor.cpp in the
'TextExtractorPODOFO' example).
It seems that if "Tf" is found then I should be able to extract the current
font size and the fontname. This suggests that the whole line is picked up as
a single token and the information is extracted from that. Is that correct?
See the extract from a pdf file I've used below..(in red). Clearly there are
white spaces between the 'Tf' the Font type 8 and the size 1Then how is it
extracted. I thought the token would pick up the 'Tf' then the '1' then the
'F8' in separate loops.
BT
/F8 1 Tf
TextExtractor.cpp ,, from line 97 to 108.
if( bTextBlock ) { if( strcmp( pszToken, "Tf" ) == 0
) // Set text font and size { dCurFontSize =
stack.top().GetReal(); stack.pop();
PdfName fontName = stack.top().GetName(); PdfObject* pFont =
pPage->GetFromResources( PdfName("Font"), fontName );
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users