----------------------------------------
> From: lrose...@adobe.com
> To: itext-questions@lists.sourceforge.net
> Date: Tue, 10 Mar 2009 07:41:50 -0700
> Subject: Re: [iText-questions] PDF "philosophy" (was RE: modifed sample, 
> question on PDF contents)
>
> You continue to think that PDF is a format that is JUST about presentation 
> (aka "pictures") - but if you look at the specification you will find that it 
> provides for BOTH presentation AND semantics/data.

This is all I've ever seen from the government forms. I still
can't figure out how to get the numbers out of the 1040 form.
I'll take a look at the spec some more but it doesn't look like
it widely use for that. 
 

>
> And that's just a 2D example - you can do the same thing with 3D in PDF as 
> well. Check out . (NOTE: I think that sample may require Reader 8.1 or 
> later). That example demonstrates various types of 3D information including 
> PMI, which is what is used to actually PRODUCE 3D objects by automated 
> production systems.


Yes, the 3D hierarchial model is nice and I can interact with

it and it seems to be quite complete. This is not a rendered

"presentation format" or an instruction manual as outlined earlier. 
If it is that extensible then great for Adobe.
I guess if I could get the data out in some format appropriate
to the need that would be fine.  
This is not the collection of rendered images
with blobs of glyphs that I'm dealing with or that began the discussion. If you 
preserve the model information that is great, but all I've seen and all we 
discussed prior to that is closer to
a BMP file. 

>
> Don't get me wrong - XML is great! PDF supports native XML as well as 
> XML-like concepts in various places in the standard - but XML is also quite 
> limited, which is why you combine it with other things to arrive at the 
> richness that is PDF.

Agreed, I'm not even that big a fan of XML but I have tools
to manipulate it. I still can't figure out how to retrieve the
numbers I typed into the IRS PDF 1040 form but apparently
there are form export features. This is all I really need-
information import and export through programmatic means. 


>
> Also, don't confuse "open" with "free" - the two aren't the same nor should 
> they be. However, you can get a free copy of ISO 32000, as published by Adobe 
> Systems at .
>
> As far as what the US government chooses to do with their data - I have no 
> control over that. I can only ensure that the tools and technologies are 
> there to enable them to distribute both presentation and semantic information 
> in an industry standard format.

Many seem to have gone with pictures and left out the model
information. Rendered text seems to have limited use. 



>
> Leonard Rosenthol
> PDF Standards Architect
> Adobe Systems
>
> -----Original Message-----
> From: Mike Marchywka [mailto:marchy...@hotmail.com]
> Sent: Tuesday, March 10, 2009 10:00 AM
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] PDF "philosophy" (was RE: modifed sample, 
> question on PDF contents)
>
>
>
>
> ----------------------------------------
>> From: lrose...@adobe.com
>> To: itext-questions@lists.sourceforge.net
>> Date: Tue, 10 Mar 2009 06:23:50 -0700
>> Subject: [iText-questions] PDF "philosophy" (was RE: modifed sample, 
>> question on PDF contents)
>>
>> In Tagged PDF, these elements can be grouped together into logical blocks, 
>> such as "/BMC /H1 1 0 0 1 10 10 tm (Some text goes here) Tj /EMC". In this 
>> example, I made that text an "H1" (aka Header Level 1, just like HTML). So 
>> syntax is different, but concepts are the same.
>>
>
>> That's usually because that is how the information is received from the drug 
>> company. The FDA doesn't require "computer readable" information and so drug 
>> companies aren't going to "give away" their hard earned information if they 
>> don't have to.
>
> Yes, this is a political issue not technical. Notice
> however that there is a lot of hard earned information
> given away in various readable/workable formats ( see my link provided 
> earlier for example ). I'm not blaming Adobe, at least
> for this, just giving you some observations.
> This problem of "just looking at the pictures" is
> not unique to PDF, it comes up on most websites etc.
>
>
>>
>>>The FCC, last time I looked, even accepts submissions that disallow 
>>>extraction of images or text.
>>>
>> I'd be surprised if that were the case - but I haven't looked recently 
>> either...
>
> I was too until I tried to run pdftotext on some of their filings
> a year or so ago.
>
>>
>>>
>>> And what types of "manipulation" are you expecting? Some documents aren't 
>>> designed for manipulation, such as the plans for a Sherman Tank - while 
>>> others, such as forms make sense to enable extraction and processing of the 
>>> data.
>>
>> My "plans for a Sherman Tank" example is, believe it or not, a REAL PDF that 
>> I have seen at the DOD. Also, companies such as Boeing and Airbus also 
>> produce manuals for every plane they produce in PDF - with full technical 
>> drawings of each part. So no - not a flippant example, but a real and true 
>> one. However, I agree with you that such information needs to be both human 
>> and computer readable - which is why PDF supports BOTH rich rendering AND 
>> rich semantics for all forms of content. In fact, it's the ONLY format that 
>> supports both! (yes, PDF supports structure and metadata for vector and even 
>> 3D information to be incorporated!)
>
>
> It wouldn't surprise me if companies with models have
> rendered or distilled their "plans" into images for manuals.
> Try building a tank from the user's manual. LOL.
> Do any CAD systems store "plans" in PDF format or
> do companies extract some nice pictures that can
> not be used to efficiently build or design a tank?
> This exactly is my point that many software vendors
> confuse information ("plans") with pictures ( the illustrations
> in a users' guide or repair manual- and even these
> could benefit from interaction with things like OBD in the case of cars ). 
> This is great until, in the other examples,
> you realize that no independent people can look for questionable drug effects 
> or examine claims from approved US real estate
> professionals without retyping data and other cumbersome procedures.
>
>
>
>
>>
>>>I'd like to be able to maintain my own tax information and
>>>extract it from a filled in 1040 and not just waste time typing
>>>into an information black hole in some proprietary or unworkable
>>>format.
>>>
>> PDF isn't a proprietary format - it's an open international standard (ISO 
>> 32000-1). Can't get more "non-proprietary" than that!!
>
> Send me a free copy then LOL- although I do note you have
> plenty of free docs on your site. Fine regarding the
> proprietary part, but the issue at hand is how "workable"
> is it for the many types
> of information your customers try to use it with.
>
>>
>> But on the more general issue, what you are running into are decisions by 
>> the government that they can (and do!) make $$ selling the tax tables - and 
>> as such, there is no incentive for them to put that information into a 
>> format that "just anyone" can access. However, if you do license the 
>> information from them - you get it in machine readable format. That's 
>> capitalism - not technical ;).
>
> They license machine readable 1040's? In any case,
> this is a political issue. I guess a tax preparation industry
> is still an industry huh?
>
> I guess I'm worried that the govt seems to have been sold
> by someone on the notion that pictures are what americans need.
> But, to make most use of any data a better format would be
> something like XML or plain text, not
> a PDF file that takes an already large amount of data
> and adds formatting information that just needs to be taken
> out for analysis. There are both practical and fundamental
> problems with the format but people continue to use it
> everywhere assuming the humans will be the only
> people who want to look at the data and then only
> as pretty pictures. The SEC seems to have realized that
> but most other groups accept or encourage cave drawing submissions
> rather than computer usable documents.
>
>
>
>>
>>
>> Leonard
>>
>>
>> -----Original Message-----
>> From: Mike Marchywka [mailto:marchy...@hotmail.com]
>> Sent: Tuesday, March 10, 2009 8:13 AM
>> To: itext-questions@lists.sourceforge.net
>> Subject: Re: [iText-questions] modifed sample, question on PDF contents
>>
>>
>>
>> As a newcomer to the list I'm not sure how apropos this
>> is but until I hear otherwise I'll assume it is ok.
>> This is probably more political than itext relevant.
>>
>
> _________________________________________________________________
> Hotmail(r) is up to 70% faster. Now good news travels really fast.
> http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_HM_70faster_032009
> ------------------------------------------------------------------------------
> _______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Buy the iText book: http://www.1t3xt.com/docs/book.php
>
> ------------------------------------------------------------------------------
> _______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Buy the iText book: http://www.1t3xt.com/docs/book.php

_________________________________________________________________
HotmailĀ® is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_HM_70faster_032009
------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to