You continue to think that PDF is a format that is JUST about presentation (aka 
"pictures") - but if you look at the specification you will find that it 
provides for BOTH presentation AND semantics/data.  

For example, those airplane manuals that I mention contain NOT JUST the 
rendered view of each part, but the FULL CAD "database" of information about 
each part - which is FULLY accessible via PDF tools such as Adobe Reader.  For 
example, grab the PDF at 
<http://acroeng.adobe.com/Test_Files/structure/OfficeFloorPlan.pdf> and open it 
up in Adobe Reader 7 or later.  Use the "Object Data" tool and click on any of 
the objects (rooms, stairs, etc.).  You will see the internal information, 
directly from the CAD system, about that object.  Nothing private or 
proprietary - just using standard features of PDF that anyone can access using 
whatever tools they wish.  

And that's just a 2D example - you can do the same thing with 3D in PDF as 
well.  Check out <http://acroeng.adobe.com/leonardr/3DReviewerSample.pdf>.  
(NOTE: I think that sample may require Reader 8.1 or later).  That example 
demonstrates various types of 3D information including PMI, which is what is 
used to actually PRODUCE 3D objects by automated production systems.  

Don't get me wrong - XML is great!  PDF supports native XML as well as XML-like 
concepts in various places in the standard - but XML is also quite limited, 
which is why you combine it with other things to arrive at the richness that is 
PDF. 

Also, don't confuse "open" with "free" - the two aren't the same nor should 
they be.  However, you can get a free copy of ISO 32000, as published by Adobe 
Systems at <http://www.adobe.com/devnet/pdf/pdf_reference.html>.

As far as what the US government chooses to do with their data - I have no 
control over that.  I can only ensure that the tools and technologies are there 
to enable them to distribute both presentation and semantic information in an 
industry standard format.

Leonard Rosenthol
PDF Standards Architect
Adobe Systems

-----Original Message-----
From: Mike Marchywka [mailto:marchy...@hotmail.com] 
Sent: Tuesday, March 10, 2009 10:00 AM
To: itext-questions@lists.sourceforge.net
Subject: Re: [iText-questions] PDF "philosophy" (was RE: modifed sample, 
question on PDF contents)




----------------------------------------
> From: lrose...@adobe.com
> To: itext-questions@lists.sourceforge.net
> Date: Tue, 10 Mar 2009 06:23:50 -0700
> Subject: [iText-questions] PDF "philosophy" (was RE: modifed sample, question 
> on PDF contents)
>
> In Tagged PDF, these elements can be grouped together into logical blocks, 
> such as "/BMC /H1 1 0 0 1 10 10 tm (Some text goes here) Tj /EMC". In this 
> example, I made that text an "H1" (aka Header Level 1, just like HTML). So 
> syntax is different, but concepts are the same.
>

> That's usually because that is how the information is received from the drug 
> company. The FDA doesn't require "computer readable" information and so drug 
> companies aren't going to "give away" their hard earned information if they 
> don't have to.

Yes, this is a political issue not technical. Notice 
however that there is a lot of hard earned information
given away in various readable/workable formats ( see my link provided earlier 
for example ). I'm not blaming Adobe, at least
for this, just giving you some observations. 
This problem of "just looking at the pictures" is
not unique to PDF, it comes up on most websites etc. 


>
>>The FCC, last time I looked, even accepts submissions that disallow 
>>extraction of images or text.
>>
> I'd be surprised if that were the case - but I haven't looked recently 
> either...

I was too until I tried to run pdftotext on some of their filings
a year or so ago. 

>
>>
>> And what types of "manipulation" are you expecting? Some documents aren't 
>> designed for manipulation, such as the plans for a Sherman Tank - while 
>> others, such as forms make sense to enable extraction and processing of the 
>> data.
>
> My "plans for a Sherman Tank" example is, believe it or not, a REAL PDF that 
> I have seen at the DOD. Also, companies such as Boeing and Airbus also 
> produce manuals for every plane they produce in PDF - with full technical 
> drawings of each part. So no - not a flippant example, but a real and true 
> one. However, I agree with you that such information needs to be both human 
> and computer readable - which is why PDF supports BOTH rich rendering AND 
> rich semantics for all forms of content. In fact, it's the ONLY format that 
> supports both! (yes, PDF supports structure and metadata for vector and even 
> 3D information to be incorporated!)


It wouldn't surprise me if companies with models have
rendered or distilled their "plans" into images for manuals.
Try building a tank from the user's manual. LOL. 
Do any CAD systems store "plans" in PDF format or
do companies extract some nice pictures that can
not be used to efficiently build or design a tank? 
This exactly is my point that many software vendors
confuse information ("plans") with pictures ( the illustrations
in a users' guide or repair manual- and even these
could benefit from interaction with things like OBD in the case of cars ). This 
is great until, in the other examples,
 you realize that no independent people can look for questionable drug effects 
or examine claims from approved US real estate 
professionals without retyping data and other cumbersome procedures.




>
>>I'd like to be able to maintain my own tax information and
>>extract it from a filled in 1040 and not just waste time typing
>>into an information black hole in some proprietary or unworkable
>>format.
>>
> PDF isn't a proprietary format - it's an open international standard (ISO 
> 32000-1). Can't get more "non-proprietary" than that!!

Send me a free copy then LOL- although I do note you have
plenty of free docs on your site.  Fine regarding the
proprietary part, but the issue at hand is how "workable" 
is it for the many types
of information your customers try to use it with. 

>
> But on the more general issue, what you are running into are decisions by the 
> government that they can (and do!) make $$ selling the tax tables - and as 
> such, there is no incentive for them to put that information into a format 
> that "just anyone" can access. However, if you do license the information 
> from them - you get it in machine readable format. That's capitalism - not 
> technical ;).

They license machine readable 1040's? In any case,
this is a political issue. I guess a tax preparation industry
is still an industry huh? 

I guess I'm worried that the govt seems to have been sold 
by someone on the notion that pictures are what americans need.
But, to make most use of any data a better format would be 
something like XML or plain text, not 
a PDF file that takes an already large amount of data
and adds formatting information that just needs to be taken
out for analysis. There are both practical and fundamental
problems with the format but people continue to use it
everywhere assuming the humans will be the only
people who want to look at the data and then only
as pretty pictures. The SEC seems to have realized that
but most other groups accept or encourage cave drawing submissions 
rather than computer usable documents. 



>
>
> Leonard
>
>
> -----Original Message-----
> From: Mike Marchywka [mailto:marchy...@hotmail.com]
> Sent: Tuesday, March 10, 2009 8:13 AM
> To: itext-questions@lists.sourceforge.net
> Subject: Re: [iText-questions] modifed sample, question on PDF contents
>
>
>
> As a newcomer to the list I'm not sure how apropos this
> is but until I hear otherwise I'll assume it is ok.
> This is probably more political than itext relevant.
>

_________________________________________________________________
Hotmail(r) is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_HM_70faster_032009
------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

------------------------------------------------------------------------------
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to