Hi
Does the PdfContentsTokenizer read one line at a time.   I'm just trying to 
understand how it picks up it's data (into the PdfVariant :out Parameter)
The reason I ask is the line of code below.. (From TextExtractor.cpp in the 
'TextExtractorPODOFO' example). 
It seems that if "Tf" is found then I should be able to extract the current 
font size and the fontname.  This suggests that the whole line is picked up as 
a single token and the information is extracted from that.   Is that correct?
See the extract from a pdf file I've used below..(in red).    Clearly there are 
white spaces between the 'Tf'  the Font type 8 and the size 1Then how is it 
extracted.  I thought the token would pick up the 'Tf' then the '1' then the 
'F8' in separate loops.  
BT

/F8 1 Tf

TextExtractor.cpp ,, from line 97 to 108. 
 if( bTextBlock )            {                if( strcmp( pszToken, "Tf" ) == 0 
) // Set text font and size                {                    dCurFontSize = 
stack.top().GetReal();                    stack.pop();                    
PdfName fontName = stack.top().GetName();                    PdfObject* pFont = 
pPage->GetFromResources( PdfName("Font"), fontName );                           
         
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Podofo-users mailing list
Podofo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/podofo-users

Reply via email to